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ABSTRACT 

The  theory  and  derivation  of  the  maximum-entropy 
method  of  spectral  analysis  and  the  Burp,  algorithm,  and 
the  potential  applicability  of  these  techniques  to  radar 
signal  processing,  are  reviewed.  This  material  is 
presented  in  a  readily  comprehensible  form  for  assessment 
by  the  practicing  radar  engineer.  Topics  such  as  compu¬ 
tational  complexity,  statistical  properties,  adaptive 
clutter  suppression  and  angular  spectrum  estimation  are 
discussed.  It  is  concluded  that,  at  least  at  present, 
maximum- entropy  spectral  analysis  is  not  a  panacea  for 
radar  signal  processing,  due  to  its  substantial  compu¬ 
tational  requirements,  and  the  relatively  unknown  and 
potentially  unfavorable  statistical  properties  of  data 
analyzed  using  the  Burg  algorithm.  A  promising  area  of 
applicability  appears  to  be  that  of  adaptive  Doppler 
filtering . 


1.  INTRODUCTION 


1.1  DOPPLER  AND  ANGULAR  SPECTRA  OF  RADAR  ECHOES 

Radar  echoes  received  from  reflectors  having  a  radial  component  of 
velocity  with  respect  to  the  radar  have  their  frequency  spectra  shifted  because 
of  the  Doppler  effect.  Such  Doppler  shifts  can  in  principle  be  used  to 
distinguish  between  wanted  targets  and  unwanted  "clutter"  echoes,  if  the 
characteristics  of  the  motions  of  the  targets  and  clutter  are  sufficiently 
different  to  render  their  Doppler  spectra  distinguishable  by  the  radar  signal 
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processor.  Examples  of  clutter  are  echoes  from  mountains,  raindrops,  the 
ground,  trees,  birds,  the  surface  of  the  ocean,  insects,  etc..  Sometimes 
although  such  clutter  targets  are  not  of  principal  concern,  it  is  desirable  to 
be  aware  of  their  presence,  such  as  when  flocks  of  birds  suddenly  appear  near 
aircraft  flight  and  approach  paths. 

Radars  are  restricted  in  range  and  angular  resolution  due  to  their  finite 
signal  bandwidths  and  non-zero  antenna  beamwidths.  This  means  that  radars  have 
only  finite  range  and  angular  resolution,  so  that  targets  within  a  "resolution 
cell"  are  not  resolved  but  are  seen  as  a  single  echo.  Targets  of  different 
types  contained  in  the  same  resolution  cell,  however,  can  have  their  own 
characteristic  Doppler  spectra.  Therefore,  if  it  is  possible  to  examine  the 
Doppler  spectrum  of  echoes  from  within  a  given  cell,  knowledge  of  the  targets 
in  that  cell  can  be  inferred. 

A  spectral  estimation  problem  also  arises  in  the  case  of  sampled-aperture 
radar  systems.  Radar  echoes  are  received  by  an  array  of  antennas  as  a  super¬ 
position  of  electromagnetic  waves.  Each  wave  varies  sinusoidally  across  the 
array  with  a  spatial  frequency  that  is  proportional  to  the  sine  of  its  angle- 
of-arrival  with  respect  to  the  boresight  of  the  array.  The  angles-of -arrival 
of  the  various  echoes  can  be  inferred  if  the  spatial  frequency  spectrum  of 
such  data  can  be  estimated. 


1.2  THE  REQUIREMENT  TO  ESTIMATE  SPECTRA  USING  AS  FEW  DATA  AS  POSSIBLE 

For  the  case  of  either  Doppler  or  angular  spectrum  estimation,  the 
number  of  contiguous  data  is  limited.  For  the  case  of  angular  spectrum 
estimation,  this  number  it  equal  to  the  number  of  receivers.  For  the  case  of 
Doppler  spectrum  estimation,  the  number  of  contiguous  data  is  limited  by  the 
number  of  pulses  used  to  illuminate  a  given  resolution  cell,  or  "the  number 
of  hits  on  target".  For  a  surveillance  radar,  the  number  of  hits  on  target 
is  usually  determined  by  design  factors  such  as  how  often  it  is  required  to 
search  the  total  surveillance  volume,  the  antenna  beamwidth,  and  the  pulse- 
repetition  frequency.  For  a  tracking  radar,  it  might  well  be  possible  to 
increase  the  number  of  hits  on  target  in  volumes  of  particular  interest, 
where  a  wanted  target  is  suspected  to  exist. 

In  any  of  these  cases,  the  spectral  characteristics  of  the  data  may  be 
changing  with  time,  so  that  the  data  are  not  "stationary".  Thus  for  several 
reasons  it  is  desirable  to  be  able  to  estimate  spectra  using  as  few  data  as 
possible. 


1.3  A  SUMMARY  OF  CLASSICAL  FOURIER  SPECTRAL  ANALYSIS  AND  ITS  LIMITATIONS 

The  classical  methods  for  estimating  spectra  are  based  on  the  Fourier 
transform.  These  Fourier  techniques  are  equivalent  to  least-squares  fitting 
a  predetermined  set  of  sine  waves  to  any  given  set  of  data.  Such  data  can 
consist  either  of  samples  of  amplitude  as  a  function  of  time,  such  as  are 
obtained  from  a  coherent  radar  when  a  set  of  successive  echoes  from  a  given 
range  are  considered,  or,  of  samples  of  an  estimate  of  the  autocorrelation 
function  derived  by  averaging  lagged  cross-products  of  the  sampled  amplitude 
data . 
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In  the  Fourier  analysis  of  either  amplitude  or  autocorrelation  data, 
the  amplitudes  and  phases  of  the  predetermined  set  of  sine  waves  are  adjusted 
to  minimize  the  mean-square  difference  between  the  data  and  the  fitted  sine 
waves.  In  practice  this  fitting  operation  is  simply  the  discrete  Fourier 
transform. 

The  resolution  and  statistical  properties  of  the  Fourier  transform 
methods  are  well  known  (e.g.,  [1]).  For  example,  if  it  is  known  that  a  signal 
to  be  analyzed  consists  of  two  sine  waves  of  different  frequencies,  then  it 
becomes  difficult  to  estimate  the  difference  between  these  frequencies  when 
this  difference  is  less  than  the  reciprocal  of  the  length  of  the  data  record. 
That  is,  if  the  length  of  a  data  record  is  T  seconds,  then  spectral  details 
having  extent  in  the  frequency  domain  of  less  than  1 /T  Hz  are  obscured.  It 
Is  also  well  known  that  the  discontinuities  at  the  ends  of  the  data  record 
cause  side  lobes  in  the  Fourier  spectrum,  an  effect  known  as  "spectral  leakage", 
and  that  spectral  leakage  can  be  controlled  by  multiplying  the  data  with  a 
windowing  function  before  performing  the  Fourier  transform.  The  penalty 
exacted  by  such  windowing  is  a  further  loss  in  spectral  resolution,  so  that 
spectral  details  of  width  greater  than  1/T  are  obscured. 

Statistically,  it  is  known  that  if  sampled  amplitude  data  comprised  of 
noise  with  Gaussian  statistics  are  to  be  analyzed,  then  estimates  of  the  power 
spectrum  separated  in  frequency  by  1/T  are  statistically  independent  and  have 
standard  deviations  equal  to  their  means.  Statistical  reliability  can  be 
improved  either  by  averaging  power  spectra  derived  from  non-overlapping  sets 
of  data,  or  by  using  windowing  functions  to  reduce  the  spectral  resolution. 

In  summary,  then,  it  can  be  seen  that  the  spectral  resolution  capability 
of  the  classical  Fourier  transform  method  is  inherently  limited  to  being 
approximately  the  reciprocal  of  the  length  of  the  data  record  or  worse,  so  that 
even  if  it  is  known  on  a  physical  basis  that  there  is  information  in  the 
spectrum  of  bandwidth  less  than  this  limit,  the  technique  will  cause  such  detail 
to  be  obscured.  Such  is  the  case  for  example  when  target  echo  classification 
on  the  basis  of  Doppler  spectral  estimates  is  attempted  by  using  classical 
Fourier  spectral  analysis  on  data  sets  that  are  "too  short". 


1.4  ALTERNATIVE  METHODS  FOR  SPECTRAL  ANALYSIS 

Within  the  past  decade,  some  alternative  methods  for  spectral  analysis 
have  found  considerable  success  in  the  fields  of  seismic  and  oil-exploration 
geophysics,  and  in  speech  analysis  and  transmission.  These  alternative  methods 
have  also  aroused  interest  in  the  sonar  and  radar  communities. 

The  alternative  methods  differ  fundamentally  from  the  Fourier  methods. 
The  Fourier  methods  tacitly  assume  that  the  data  have  been  generated  by  a  set 
of  independent  harmonic  generators  having  frequencies  (0/T) ,  (Jl/T),  (12/T), 
etc.,  where  T  is  the  length  of  the  set  of  data  being  analyzed.  The  alternative 
methods  to  be  discussed  here  are  based  on  a  different  tacit  assumption.  That 
assumption  is:  given  a  set  of  data,  it  is  possible  to  design  a  filter  which 
has  as  its  output  random  "white"  noise  when  the  data  is  applied  to  its  input . 
Such  a  filter  is  known  as  a  "whitening"  filter.  An  estimate  of  the  spectrum 
of  the  data  can  then  be  taken  to  be  proportional  to  the  inverse  of  the  power 
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transfer  function  of  the  whitening  filter  (see  Figure  1).  The  spectral 
estimation  techniques  thus  consist  of  techniques  for  designing  whitening  filters 
using  only  the  available  data.  This  approach  arises  naturally  in  the  course  of 
the  formulation  of  maximum-entropy  method. 


INPUT  DATA  _ _ _  WHITE  NOISE 


SEQUENCE 

INPUT 

WHITENING 

WITH  POWER 

WITH  POWER 

FILTER 

OUTPUT 

SPECTRUM  P 

SPECTRUM  P(f) 

TRANSFER 

(CONSTANT) 

FUNCTION  W(f) 


P<f)  =  P/|W(f)|2 


Figure  1.  Whitening-Filter  Method  of  Power  Spectrum  Estimation 


For  speech  analysis,  a  model  for  the  whitening  filter  can  be  based  on 
the  known  physical  characteristics  of  the  human  voice.  Similarly,  for  geo¬ 
physical  signal  processing,  knowledge  of  the  physical  processes  generating  the 
signals  to  be  analyzed  is  of  aid.  For  the  case  of  radar  signals,  it  is  not 
yet  clear  whether  or  what  physical  models  can  be  used  to  provide  a  priori 
information  regarding  the  generation  of  the  whitening  filters. 


1.5  THE  MAXIMUM-ENTROPY  METHOD 

A  technique  which  leads  to  the  generation  of  a  whitening  filter  and 
which  has  helped  stimulate  current  research  in  the  area  of  radar  applications 
is  known  as  the  maximum-entropy  method.  The  details  of  this  method  and  its 
theoretical  formulation  are  outlined  in  Section  2. 

It  must  be  noted  that  the  "true"  maximum-entropy  method  is  based  on  the 
assumption  that  not  time-series  amplitude  data,  such  as  would  be  obtained  from 
a  sequence  of  radar  echoes  from  a  given  range,  azimuth  and  (perhaps)  elevation 
cell  are  available,  but  rather  that  a  set  of  autocorrelation  data  is  available 
Such  autocorrelation  data  are  not  available  from  present-day  radars. 


1.6  THE  BURG  ALGORITHM 

To  obviate  the  need  for  estimating  autocorrelation  data,  the  Burg  algo¬ 
rithm  was  developed.  The  details  of  this  algorithm  and  its  theoretical 
foundations  are  reviewed  in  Section  3.  The  Burg  algorithm  generates  whitening 
filters  directly  from  time-series  amplitude  data,  in  a  relatively  efficient 
manner  which  reflects  the  gist  of  the  maximum-entropy  technique. 
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1.7  STATISTICAL  BEHAVIOR  OF  THE  MAXIMUM-ENTROPY  AND  BURG  TECHNIQUES 

Radar  detection  is  closely  related  to  statistical  decision  theory. 
Therefore  it  is  necessary  to  understand  the  statistical  effects  of  any  procedure 
applied  to  the  data  prior  to  where  any  detection  decision  is  made.  To  date,  the 
statistical  behavior  of  spectra  derived  using  the  maximum-entropy  method  have 
been  studied  only  asymptotically  [2],  and  the  statistical  behavior  of  Burg 
spectra  have  been  studied  only  by  means  of  computerized  Monte  Carlo  simulations 
using  simulated  data  [3,4]  or  very  limited  quantities  of  real  data  [3,3].  It 
is  on  the  basis  of  its  statistical  behavior  that  the  applicability  of  the  Burg 
method  to  radar  signal  processing  will  be  confirmed  or  rejected.  This  topic, 
which  is  presently  unresolved,  is  discussed  in  Section  3.2  and  Section  5. 


2.  THE  BASIS  AND  DERIVATION  OF  THE  MAXIMUM-ENTROPY  METHOD 
OF  SPECTRAL  ANALYSIS* 

The  maximum-entropy  method  of  spectral  analysis  is  based  on  a  line  of 
reasoning  proposed  by  J.P.  Burg  in  the  1960s  [6,7]  and  finally  published  in 
detail  by  him  in  1975  [8],  He  began  by  supposing  that  we  have  available  M+l 
ewov-fvee  autocorrelation  data  R(mAt)  for  a  set  of  M+l  lags: 

R(0) ,R(At) ,R(2At) , . . . ,R(MAt) .  He  proceeded  to  show  that,  instead  of  simply 
Fourier  transforming  this  autocorrelation  data  to  obtain  a  power  spectrum,  it 
is  possible  to  devise  an  alternative  procedure  based  on  finding  a  power 
spectrum  which  is  both  consistent  with  the  given  autocorrelation  data  and  also 
has  maximum-entropy  in  the  sense  defined  by  Shannon's  Information  Theory.  The 
advantage  of  this  technique  is  supposed  to  be  that  it  is  not  subject  to  the 
smoothing  effects  introduced  when  a  truncated  series  of  autocorrelation  data 
are  Fourier  transformed.  This  Section  outlines  the  details  of  the  derivation 
of  the  maximum-entropy  method  of  spectral  analysis  and  shows  how  it  differs 
from  conventional  Fourier  techniques.  It  also  lays  the  foundation  for  Section  3, 
where  the  Burg  algorithm  for  the  spectral  analysis  of  amplitude  time-series 
data  is  described  and  derived. 


2.1  DEFINITION  OF  THE  AUTOCORRELATION  FUNCTION 

The  autocorrelation  function  of  a  time-dependent  signal  x(t)  can  be 
defined  as 


R(mAt)  =  <x* (t)x(t+mAt)> 


(2.1) 


where  the  brackets  <  >  denote  the  expected  value  of  the  quantity  they  contain. 
The  asterisk  *  denotes  that  if  the  data  corresponding  to  x(t)  are  complex 
(that  is,  if  in-phase  and  quadrature  components  are  available),  then  the  complex 
conjugate  is  to  be  taken.  The  time  variable  t  can  be  either  discrete  or 
continuous,  but  it  is  assumed  that  the  autocorrelation  data  are  available  at 


*  Sections  2  and  3  are  based  closely  on  the  derivation  contained  in  Chapter 
II. A  of  Burg’s  Ph.D.  Thesis  [8], 
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the  discrete  time-lag  intervals  mAt ,  where  m  is  a  positive  or  negative  integer, 
or  zero. 

Notice  that  the  definition  of  R(mAt)  implies  that  R(mAt)  is  independent 
of  the  actual  times  at  which  the  signal  is  observed.  This  means  that  the 
data  x(t)  are  assumed  to  have  statistics  that  are  stationary  and  do  not  vary 
with  the  passage  of  time.  The  supposition  that  such  autocorrelation  data  are 
available  is  usually  unrealistic  in  the  radar  situation,  but  in  some  sonar 
and  geophysical  situations,  good  estimates  of  the  autocorrelation  function  can 
be  made. 


2.2  THE  FOURIER-TRANSFORM  RELATION  BETWEEN  THE  AUTOCORRELATION  FUNCTION  AND 
THE  POWER  SPECTRUM 

In  general,  a  sampled  autocorrelation  function  R(mAt)  and  its  associated 
power  spectrum  P(f),  where  f  is  frequency,  are  related  by  the  Fourier  transform 
pair 


and 


CO 

P(f)  =  At  £  R(mAt)exp(-j  2iTfmAt)  (2.2) 

m=-oo 


R(mAt) 


l/2At 

P(f  )exp(j  2irfmAt)df 

-l/2At 


(2.3) 


We  note  that,  since  power  is  always  real,  eqns.  (2.2)  and  (2.3)  each  imply 
that 


R(-mAt)  =  R*(mAt) 


(2. A) 


This  result  is  also  consistent  with  the  definition  of  R(mAt)  as  given  by 
eqn.  (2.1).  We  also  note  that  the  range  of  integration  is  over  ±  l/2At  ra> her 
than  ±°°,  since  the  data  are  sampled. 

If  we  have  available  only  (M+l)  autocorrelation  data,  we  could  obtain 
an  estimate  of  the  power  spectrum  by  simply  truncating  the  infinite  sum  of 
eqn.  (2.2)  at  m  =  ±M  and  arbitrarily  setting  R(mAt)  to  zero  for  m  =  ±(M+1)  and 
beyond.  But  this  procedure  introduces  a  step  discontinuity  into  the  auto¬ 
correlation  data,  which  can  lead  to  problems  in  the  form  of  physically 
unrealistic  estimates  of  negative  power  at  some  frequencies.  The  step  discon¬ 
tinuities  can  be  thought  of  as  causing  a  "ringing"  in  the  spectral  domain. 

Another  way  to  look  at  this  is  to  consider  that  the  autocorrelation 
data  have  been  modulated  (or  multiplied)  by  a  square-wave  gate  function.  This 
means  that  the  true  power  spectrum  corresponding  to  the  untruncated  autocor¬ 
relation  data  has  been  convolved  with  the  spectrum  of  the  square-wave  gating 
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function,  and  it  is  this  latter  spectrum,  which  has  both  positive  and 
negative  amplitudes,  that  introduces  the  apparent  negative  powers  into  the 
estimated  power  spectrum. 


This  problem  of  negative  power  estimates  can  be  sidestepped  by  choosing 
to  multiply  the  autocorrelation  data  by  some  weighting  function  which  decays 
to  zero  at  m  *»  ±M  and  has  no  negative  amplitude  components  in  its  frequency 
spectrum.  A  typical  example  is  the  triangular  weighting  function,  which  has 
its  maximum  value  at  m=0  and  decays  linearly  to  zero  at  m  =  +(M+1).  Such 
non-uniform  weighting  eliminates  spurious  negative  power  estimates,  but  at 
the  expense  of  smoothing  the  estimated  power  spectrum  more  than  does  the 
simple  uniform  square-wave  weighting.  There  is  no  way  of  avoiding  this 
tradeoff  between  stability  and  smoothing  when  using  classical  Four ier-trans- 
form  processing. 


2.3  THE  MAXIMUM-ENTROPY  CRITERION 

Burg's  approach  was  to  try  to  estimate  the  spectrum  of  the  process 
generating  the  given  autocorrelation  data  without  making  any  a  priori  assump  • 
tions  about  the  unknown  data  for  lags  ±(M+l)At  and  beyond.  He  began  by  noting 
that  there  are  an  infinite  number  of  power  spectra  which  can  be  Fourier  trans¬ 
formed  using  eqn.  (2.3)  to  yield  autocorrelation  functions  identical  to  the 
given  set  of  sampled  data  over  the  range  of  lags  for  which  the  data  are 
available.  Beyond  that  range,  of  course,  all  these  autocorrelation  functions 
will  differ. 

The  problem  then  becomes:  which  one  of  this  infinite  set  of  power 
spectra  to  choose?  To  answer  this  question,  Burg  used  the  criterion  of 
"information  entropy"  as  defined  for  power  spectra  by  Shannon's  Information 
Theory  [9],  A  good  discussion  of  the  motivation  for  this  choice  can  be 
found  in  Ref.  [10],  This  entropy  H  is  defined  to  within  a  scale  factor  by 
the  formula 

Si/ 2At 

loge[P(f)]df  (2.5) 

-l/2At 

Let  it  be  assumed  that  all  the  power  spectra  under  consideration  are 
constrained  to  have  the  same  total  power  PQ ,  where 

l/2At 

Pq  =  j  P  (f )df  (2.6) 

J-l/2At 

Then  for 

P(f)  =  P  At  (2.7) 

o 

that  is,  P(f)  is  a  flat  or  a  white  power  spectrum,  the  entropy  H  has  its 

maximum  value  H  ,  where 

o’ 
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H  =  log  (P  At) 
o  e  o 


(2.8) 


We  also  note  by  comparing  eqn.  (2.6)  with  eqn.  (2.3)  when  m=*0  that 

P  «*  R (0)  (2.( 

o 

It  Is  relatively  straightforward  to  convince  ourselves  that  any  non¬ 
white  spectrum  will  have  smaller  entropy  than  a  white  spectrum  having  the 
same  total  power.  We  can  start  by  noting  that  any  power  spectrum  P(f)  must 
have  a  mean  value  of  PQAt,  from  eqn.  (2.6).  Then  P(f)  can  be  expressed  in 
the  form 


P(f)  =  PQAt  [1  +  Pl(f)] 


(2.10) 


so  that 


,l/2At 


H  =  H  +  At 
o 


loge[l  +  Pl(f)]df 


(2.11) 


-1/ 2At 


For  the  sake  of  simplicity  we  will  assume  that  the  magnitude  of  p^(f)  is  much 
less  than  one,  so  that 


!P1(f>!  «  1 


(2.12) 


and  we  can  write 


logjl  +  px(f)]  =  P1(f)  -  j  p^(f)  +  ... 


(2.13) 


Then  from  eqn.  (2,10)  we  can  write 


H-H  =  At 
o 


1/ 2At 


-l/2At 


[pi(f)  ~  j  pi(f)  +  •••]df 


(2.14a) 


1/ 2At 


p2(f)df  <  0 


-1/ 2At 


(2.14b) 


where  we  have  noted  from  eqn.  (2.10)  that  the  average  value  of  Pj>(f)  must  be 
zero.  Thus  we  see  that  H<H0,  so  that  for  a  given  signal  power  level,  a  white 
noise  signal  indeed  has  maximum  entropy.  Although  we  have  only  proven  this 
result  for  a  special  case,  it  is  true  in  general. 


This,  then,  is  the  criterion  that  Burg  used  to  select  one  of  the 
infinite  sets  of  possible  power  spectra:  in  the  absence  of  any  further 
information  about  the  process  generating  the  data t  a  reasonable  choice  for 
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the  power  spectrum  would  be  one  which  both  sot  Is  tied  the  const  mints  imposed 
by  the  given  autocorrelation  data  and  also  had  the  maximum  entropy,  as 
defined  by  eqn.  (2,5). 

We  have  already  seen  that,  where  there  are  no  constraints  due  to  the 
autocorrelation  (unction,  a  white  noise  spectrum  Is  chosen,  since  white  noise 
has  the  greatest  possible  entropy.  White  noise  Is  charact or  1  zed  by  Its  random 
and  unpredictable  nature.  Thus,  by  Inference,  choosing  a  power  spectrum  which 
Is  constrained  by  the  autocorrelation  data  and  yet  has  maximum  entropy  In  the 
taco  of  these  constraints,  suggests  that  this  power  spectrum  corresponds  to 
the  most  random  possible  process  consistent  with  the  observed  data.  Our  next 
task  is  to  find  out  how  to  estimate  this  spectrum. 


2.4  THL  CONSTRAINT  LN1 ROPY-MAX  1MIZA1  ION  PROBUM  AND  1111  FUNCTIONAL  I  ORM  01 
THL  SPLCTRAL  LSI  IMA  11. 

We  can  estimate  the  maximum-entropy  spectrum  by  setting  up  and  then 
solving  a  constrained  maxlml/.a(  Ion  problem.  This  can  ho  done  bv  using 
Lagrange's  undetermined  multipliers.  We  begin  bv  defining  the  Inaction 

*[P(f)l: 


*lr(0) 


1  /  2At 
^-1/2A 


log  P(t)dl 
e 


m 

+ 

m~-M 


\ 


m 


R (m At) 


1/2  At 

l’(l  )exp(J2:itniAt  )dl 

-l/2At 


(.'.!!)) 


where  the  \ms  are  the  undetermined  multipliers,  and  the  terms  in  the  square 
bracket  are  the  constraint  conditions  derived  t  rom  eqn.  (2.1),  which  spec  I  - 
t  les  that  the  Fourier  transform  of  the  power  spectrum  must  match  the  M+l 
given  values  of  the  autocorrelation  function. 


It  wo  now  take  the  derivative  ol  4*  1 1’  ( I  )  I  with  respect  to  l’(l)  and  set 
the  result  equal  to  zero,  we  get 


At 


1 

I 


1  /  2  A  t 


i’lt) 


lit 


1  /  2At 


ni“-M 


—  I  /  2  A  t 

Solving  eqn.  (2.1b)  tor  I* (I  )  yields 


f 

•’  _  i  /  > 


exp  (  |  2  ii  t  mAt  )d  I  ■*  0 


(2.1b) 


-1/2AI 


M 

r(f)  “  At  /  >:  \  exp  (  |  2>i  t  mAt  ) 

m~- M 


(2.1/' 


LO 


and  we  now  have  the  functional  form  of  the  maximum  entropy  estimate  of  P(f). 
But  we  have  yet  to  determine  the  Ams  in  terras  of  the  autocorrelation  data. 

We  already  know  that  if  P(f)  is  a  power  spectral  density,  then  P(f) 
must  be  real  and  positive.  This  means  that  we  must  have 


A  =  A*  (2.18) 

-m  m 


so  that  the  imaginary  terms  in  the  denominator  of  eqn.  (2.17)  cancel, 
we  can  write 


-1 

A  exp(l2nfmAt)  *»  P 
m  M 

m=-M 


1 


M 

£  a(m,M)exp(-j2irfmAt) 
m=l 


? 


Then 


(2.19) 


where  the  M+l  Independent  unknown  A^  have  been  replaced  by  the  real  unknown 
P[4  and  the  M  complex  unknown  a(m,M)s.  It  will  turn  out  that  an  explicit 
solution  for  the  Ams  is  not  required;  it  is  only  important  that  we  can  write 
P(f)  as 


P(f) 


m=l 


VC 


a(m,M)exp(-j2irfmAt) 


(2.20) 


2.5  DETERMINATION  OF  THE  COEFFICIENTS  IN  THE  SPECTRAL  ESTIMATE  FROM  THE 
AUTOCORRELATION  DATA 


Now  we  are  faced  with  the  problem  of  how  to  determine  the  a(m,M)s  from 
the  R(mAt)s.  We  can  do  this  by  going  back  to  eqn.  (2.3)  and  evaluating  the 
integral  using  P(f)  as  given  by  eqn.  (2.20).  We  can  do  this  most  easily  by 
changing  variables  as  follows.  We  let 


z  =  exp(j2irfAt) 


so  that 


df 


z  dz 
j  2irAt 


(2.21) 


(2.22) 


We  note  that  for  f  =  ±l/2At,  the  corresponding  values  for  z  is  z=-l  in  both 
cases.  Thus  the  integral  of  eqn.  (2.3)  has  been  transformed  into  a  contour 
integral  around  the  unit  circle  in  the  complex  z-plane,  and  can  be  written  as 


R(mAt) 


m-1 

z  dz 


'  M  _k"l  " 

£  a(k,M)z 
k-0 


M 

T.  a*(k,M)z 
k=0 


c 


(2.23) 


where  we  have  defined 
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a(0,M)  =  -1  (2.24) 

to  simplify  the  notation.  Next,  using  eqn.  (2.23),  we  form  the  summation 


M  f  z  dz 

j2irj  f  M  ' 

-  £  a*(k,M)z 

k=0 


(2.25b) 


We  note  without  proof  that  it  is  always  possible  to  formulate  the  polynomial 
in  the  denominator  of  eqn.  (2.25b)  so  that  it  does  not  have  any  zeroes  inside 
the  unit  circle.  Then  for  r  >_  1,  the  integral  of  eqn.  (2.25b)  is  zero,  by 
Cauchy's  integral  theorem.  For  the  case  r=0,  we  observe  that  there  is  a 
single  pole  at  z=0.  For  the  case  of  a  single  pole,  Cauchy's  integral  theorem 
can  be  stated  as 


_|_j  f^.dz  =  f(0) 


(2.26) 


so  that  the  right-hand  side  of  eqn.  (25b)  is  equal  to  PM,  since  f(0)  =  -a(0,M) 

=  -(-1)  =1. 


Finally  we  have  the  system  of  W-l  complex  equations  in  M+l  unknowns 
required  to  be  solved  to  determine  the  M  complex  a(m,M)s  and  the  real  unknown 

PM= 

M 

R(r)  -  £  a(ra,M)R(r-m)  =  (r=0)  (2.27a) 

m=l 


M 

R(r)  -  £  a(m,M)R(r-m)  =  0  (r=l,2, 3. . . ,M)  (2.27b) 

m=l 

(From  now  on  we  will  often  set  At=l  to  simplify  the  notation.) 


2.6  THE  MAXIMUM-ENTROPY  SPECTRAL  ESTIMATE  AND  PREDICTION-ERROR  FILTERS 

The  set  of  coefficients  {l,-a(l,M) ,-a(2,M) , . . . ,-a(M,M) }  in  eqns.  (2.27a) 
and  (2.27b)  define  what  is  called  a  prediction-error  filter  (or  PEF)  of  order 
M  and  length  (Mfl).  This  prediction-error  filter,  as  the  name  suggests,  uses 
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the  M  values  of  the  autocorrelation  function,  R(-M)  to  R(-l) ,  to  predict  an 
estimated  value  for  R(0);  then  it  subtracts  this  estimate  from  the  actual 
value  of  R(0)  and  outputs  the  difference,  P^.  In  this  manner  the  PEF  attempts 
to  account  for  all  the  predictable  behavior  of  the  autocorrelation  function 
that  can  be  expressed  in  terms  of  a  weighted  sum  of  M  of  its  terns.  is 
the  power  that  cannot  be  accounted  for  on  the  basis  of  this  model. 

The  eqns.  (2.27b)  can  be  used  to  extrapolate  the  autocorrelation 
function  beyond  the  limit  r=M.  This  extrapolated  autocorrelation  function  is 
the  Fourier  transform  of  the  maximum-entropy  power  spectrum,  as  required  by 
Fourier  transform  relation  of  eqn.  (2.3).  This  extrapolation  of  the  auto¬ 
correlation  function  is  optimum  only  in  the  sense  that  it  satisfies  the 
constrained  entropy  maximization  criterion.  The  usual  radar  signal  processing 
approach  of  matched  filtering  has  nothing  whatever  in  common  with  this  result. 

Examination  of  eqn.  (2.20)  shows  that  the  maximum-entropy  power  spectrum 
is  inversely  proportional  to  the  squared  magnitude  of  the  Fourier  transform 
of  the  prediction-error  filter  coefficients;  that  is,  if  we  write 

M  -1 

F(z,M)  =  1  -  £  a (m,M) z  (2.28) 

m=l 

where 

z  *  exp(j27rfAt)  (2.21) 

then  we  can  rewrite  eqn.  (2.20)  for  the  maximum-entropy  power  spectrum  as 

P(f)  =  PMAt/jF(z,M)|2  (2.29) 

This  means  that  the  PEF  can  be  thought  of  as  a  whitening  filter,  for  if  we 
apply  a  signal  which  has  a  power  spectrum  P(f)  to  the  input  terminals  of  such 
a  PEF,  then  the  power  spectrum  of  the  output  signal  would  be  Pj^At,  which  is  a 
constant,  independent  of  frequency.  Constant  power  spectra  correspond  to 
white  noise;  hence  the  name  whitening  filter. 


2.7  SOLVING  FOR  THE  COEFFICIENTS  OF  THE  MAXIMUM-ENTROPY  PREDICTION-ERROR 
FILTER 

The  set  of  eqns.  (2.27a)  and  (2.27b)  can  be  written  in  matrix  form: 
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R(0) 

R(-l) 

R<-2) 

R(l) 

R(0) 

R(-l) 

• 

R(2) 

R(l) 

R(0) 

R(-M)  1 

-a(l,M) 

-«(2,M) 

•  • 

R’C-l)  -a(M-l,M) 

R(0)  -ot(M,M) 


PM 

0 

II 

0 

0 

0 

(2.30) 


It  can  be  seen  that  the  matrix  of  autocorrelation  data  has  a  highly  ordered 
structure.  This  structure,  in  which  the  elements  of  each  diagonal  are 
identical  is  called  Toeplitz  by  mathematicians.  In  such  an  (M+l)x(M+l) 
Toeplitz  matrix  there  are  (M+l)2  entries,  only  2M+1  of  which  are  distinct. 

For  the  matrix  of  eqn.  (2.30),  because  of  the  conjugate-symmetry  relation  of 
eqn.  (2.4),  there  are  only  M+l  independent  elements  (i.e.,  the  matrix  is  also 
Hermit ian) . 

Usually  it  is  necessary  to  solve  a  matrix  equation  such  as  eqn.  (2.30) 
by  inverting  the  matrix.  When  the  matrix  is  Toeplitz,  however,  such  equations 
can  be  solved  by  a  relatively  simple  procedure  called  the  Levinson  recursion 
algorithm. 

We  begin  by  using  only  the  term  R(0).  Then  the  PEF  has  only  one  term, 
and  eqn.  (2.30)  becomes 


R(Q)xl  =  Pr 


(2.31) 


Next  we  shall  see  how,  as  we  take  each  value  R(l) ,R(2) , . . . ,R(M)  in  turn,  we 
can  uniquely  extend  the  length  of  the  PEF  by  one  coefficient  each  time,  in 
a  highly  efficient  manner.  We  start  the  derivation  of  the  recursion 
algorithm  by  assuming  that  we  have  already  solved  the  matrix  equation 


R(0)  R(-l) 

R(l)  R(0) 


R(-K)l  p 

-o(l,K) 
•  • 

•  • 

If  (0)  -a  (K,K) 


(2.32) 


for  some  value  of  K,  K=0,l,2, . . . ,M-1,  and  that  we  have  obtained  the  solution 
for  the  PEF  of  order  K  and  length  K+l.  We  observe  that  we  can  rewrite  eqns. 
(2.27a)  and  (2.27b),  by  taking  their  complex  conjugate  and  using  eqn.  (2.4), 
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M 

R(-r)  -  £  a*(m,M)R(m-r)  =  Pu  (r=0)  (2.33a) 

i  M 

m«l 


M 

R(-r)  -  £  a*(m,M)R(m-r)  =  0  (r=l,2, . . . ,M) 

m=l 

Equations  (2.33a)  and  (2.33b)  can  also  be  written  in  matrix  form: 


R(0)  R(-l)  R(-2)  ...  R(-M) 

-a*  (M,M) 

0 

R(l)  R(0)  R(-l) 

« 

-a*  (M-l  ,M) 

0 

R(2)  R( 1 )  R(0)  *•.  ! 

•  •  • 

•  •  • 

*  •  •  • 

-a* (M-2 ,M) 

• 

0 

•  •  • 

!  *  *  •  .  R(0)  R(-l) 

• 

-a*(l,M) 

• 

• 

0 

R(M)  .  R(l)  R(0) 

1 

PM 

(2.33b) 


(2.34) 


If  we  compare  eqns.  (2.34)  and  (2.30),  we  can  see  that  the  same  results  are 
obtained  when  the  sequence  of  PEF  coefficients  is  reversed  and  the  coefficients 
are  conjugated.  This  observation  is  the  key  to  the  Levinson  recursion. 


We  want  the  output  of  each  successively  higher  order  PEF  to  have  the 
same  functional  form;  that  is,  an  output  sequence  with  one  non-zero  term  and 
with  all  other  terms  being  zeroes.  Therefore,  using  eqns.  (2.32)  and  (2.34), 
we  write 


~R(o) 

R(-l) 

R(-K) 

R(-K-l) 

/ 

1 

0 

R(l) 

R(0) 

• 

R(-K+l) 

R(-K) 

l 

-a(l,K) 

-a* (K,K) 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

l 

• 

• 

• 

-a(K+l,K+l) 

• 

R(K) 

R(K-l) 

• 

R(0) 

R(-l) 

I 

-a(K,K) 

-a*(l,K) 

R(K+1) 

R(K) 

R(l) 

R(0) 

\ 

0 

1 

(2.35a) 
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(2.35b) 


Comparing  eqns.  (2.35a)  and  (2.35b)  with  eqn.  (2.32),  we  see  that  we  have 

Introduced  one  new  piece  of  data,  R(K+1),  and  three  new  parameters: 

a(K+l,K+l),P  and  Q  .  From  eqn.  (2.35a)  we  can  write 
K.  •  J.  K 

K 

Q  =  R(K+1)  -  Z  a(k,K)R(K+l-k)  (2.36) 

K  k=l 

Comparison  of  eqn.  (2.36)  and  eqn.  (2.27b)  for  EbK  and  r=k+l  shows  that  QK  is 
the  difference  between  the  extrapolated  estimate  of  R(K+1)  produced  by  the 
PEF  of  order  K  and  the  actual  value  of  R(K+1). 

If  Qv  is  zero,  then  we  can  see  from  the  right-hand  sides  of  eqns.  (2.35a) 
and  (2.35b)  that  a(K+l,K+l)  must  also  be  zero,  and  the  value  of  R(K+1)  and  all 
subsequent  values  of  the  autocorrelation  function  are  perfectly  predictable 
from  the  PEF  we  already  have.  Also,  we  can  see  from  eqn.  (2.40)  below  that 
**K  ”  Pr+1  =  ^K+2  =  •••  becomes  constant.  This  special  case  arises  when  the 
process  generating  the  signal  whose  autocorrelation  function  data  are  being 
analyzed  can  be  modelled  as  the  output  of  a  feedback  network  containing 
exactly  K  feedback  loops  and  being  excited  by  an  input  signal  consisting  of 
uncorrelated  white  noise  with  power  density  P^.  Such  a  process  is  often 
referred  to  as  being  "autoregressive"  or  "all-pole",  and  therefore  maximum- 
entropy  spectral  analysis  turns  out  to  be  based  on  an  autoregressive  model 
of  the  signal-generating  process.  Of  course,  if  the  signal-generating 
process  cannot  be  so  modelled,  then  Qk  vanishes  only  in  the  limit  K-*=°.  There 
is  further  discussion  of  this  topic  in  Sections  3.7  and  5.2.1. 

If  Qj£  is  non-zero,  then  by  equating  the  right-hand  sides  of  eqns. 

(2.35a)  and  (2.35b)  we  can  write 

PK  -  a (K+l , K+l ) Q*  -  PK+1  (2.37) 

and 

Qk  -  a(K+l,K+l)PK  =  0  (2.38) 

From  eqn.  (2.38)  we  can  write 


a (K+l, K+l)  -  Q  /Pv 


(2.39) 
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and  by  eliminating  between  eqns.  (2.37)  and  (2.38)  we  can  write 

PK+1  "  (1"  la<K+1.K+1) l2)pK  (2.40) 

Equation  (2.40)  has  the  interesting  implication  that  either 

|a(K+l,K+l)  |  <_  1  (2.41) 

or  else  we  could  have  negative  power  at  the  output  of  the  PEF.  This  possi¬ 
bility  does  not  arise  if  we  have  proper  autocorrelation  data,  for  as  we 
stated  following  eqn.  (2.25b),  it  is  always  possible  to  choose  a  PEF  which 
has  all  its  roots  inside  the  unit  circle  on  the  complex  z-plane.  There  is  a 
theorem  in  algebra  which  states  that  the  coefficient  of  the  last  term  of  the 
z-transform  of  the  PEF  as  we  have  formulated  it  (i.e.,  a(K+l,K+l))  is  equal 

to  (±1)  times  the  product  of  its  K+l  roots.  If  all  the  roots  have  magnitude 

less  than  one,  then  it  follows  that  inequality  (2.41)  will  be  obeyed. 

As  an  aside,  we  note  there  is  one  special  case  where  equality  can  be 
achieved  for  eqn.  (2.41),  and  that  is  when  all  K  roots  of  the  PEF  lie  on  the 
unit  circle.  This  special  case  corresponds  to  having  a  signal  comprised  of 
K  complex  sinusoidal  terms  and  no  additive  noise.  Such  a  signal  is  not 
truly  autoregressive  in  nature,  for,  although  it  can  be  modelled  by  the  out¬ 
put  of  a  feedback  network  which  has  K  feedback  loops,  no  input  signal  is 
required.  The  output  signal  is  determined  strictly  by  the  initial  cond^ions 
when  the  network  was  activated. 

Also,  we  can  see  from  eqn.  (2.40)  that  if  |a(K+l,K+l)j  =1,  then  Pr+1  = 

0.  Going  back  to  eqn.  (2.38),  we  see  that  then  we  must  also  have  QK+1  =  °t 

so  that  the  Levinson  recursion  terminates  for  a  PEF  of  order  K+l  when 
| a (K+l, K+l) |  =  1. 

Returning  now  to  the  main  argument  of  the  derivation,  it  remains  to  be 
shown  how  to  derive  the  remaining  coefficients  a(l,K+l) ,a(2,K+l) , . . . ,a(K,K+l) 
of  the  K+l1-*1  order  PEF  from  the  K11*1  order  PEF.  We  do  this  by  noting  that 
from  the  left-hand  side  of  eqn.  (2.35a)  we  can  write 


1 

-a(l,K) 

-«(2,K) 

0 

-a*(K,K) 

-a*(K-l,K) 

"l 

-a (1 ,K+1) 

-a (2, K+l) 

• 

• 

• 

-a(K,K) 

0 

-a (K+l, K+l) 

-a* (1,K) 

1 

S 

• 

• 

• 

-a(K,K+l) 

-a (K+l, K+l) 

or,  ir.  general 

a (k,K+l)  =  a(k,K)  -  a(K+l,K+l)a* (K+l-k,K) 

k  -  1,2, ... ,K 


(2.43) 
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Thus  we  can  see  that  the  sequence  of  PEFs  from  order  zero  to  order  M 
are  completely  specified  if  the  set  of  coefficients 

Po,a(l,l),a(2,2) . a(K,K) . a(M,M)  (2.44) 

are  known.  For  historic  reasons,  these  ot(K,K)s  are  known  as  reflection 
coefficients,  because  of  their  mathematical  analogy  to  the  reflection 
coefficients  which  arise  in  the  theory  of  wave  propagation  through  a  layered 
medium.  It  is  interesting  to  note  that  this  set  of  coefficients  is  comprised 
of  one  real-valued  term  and  M  complex-valued  terms,  the  same  as  the  original 
set  of  M+l  autocorrelation  values. 

Finally,  we  recapitualte  the  Levinson  recursion  scheme.  We  begin  at 
K«0  and  by  letting  Po=R(0) .  Then  we  apply  eqns.  (2.36),  (2.39),  (2.43),  and 
(2.40)  for  K=0, 1,2, . . . ,M  to  obtain  a  sequence  of  PEFs.  We  end  by  applying 
eqn.  (2.29)  to  obtain  the  Mth-order  maximum  entropy  power  spectral  estimate. 
(Note  that  for  K=0,  the  summation  in  eqn.  (2.36)  is  ignored,  since  the  upper 
limit  of  the  summation  index  is  less  than  the  lower  limit.) 


2.8  EXTRAPOLATING  THE  AUTOCORRELATION  FUNCTION 

Sometimes  it  is  desired  to  extrapolate  the  autocorrelation  function 
data  beyond  the  M+l  values  assumed  available.  We  can  see  by  recalling  the 
discussion  of  Section  2.3  and  by  examining  eqn.  (2.27b)  that  such  an  extra¬ 
polation  arises  naturally  in  the  maximum-entropy  formulation.  By  removing 
the  restriction  r  <  M  on  eqn.  (2.27b)  we  obtain  the  formula  for  the  maximum- 
entropy  extrapolation  R(r)  of  the  autocorrelation  data: 

M 

R(r)  =«  I  a(m,M)R(r-m)  (r  =  M+l,M+2, . . .  ,=°)  (2.43) 

m=l 

R(r-m)  is  replaced  by  R(r-m)  in  the  right-hand  side  of  eqn.  (2.44)  for  r-m  <_  M 
i.e.,  the  given  data  are  used  wherever  available. 


2.9  INVERTING  THE  AUTOCORRELATION  MATRIX 

Sometimes  it  is  necessary  to  compute  the  inverse  of  the  autocorrelation 
matrix.  The  Levinson  recursion  can  be  used  to  do  this  efficiently,  and  we 
shall  show  how  the  inverse  of  either  the  matrix  of  given  autocorrelation  data 
or  the  inverse  of  the  maximum-entropy  extrapolation  of  the  autocorrelation 
matrix  can  be  calculated. 

Equations  (2.30)  and  (2.45)  can  be  combined  to  give 
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R(0) 

R(-l)  ... 

• 

R(-M) 

R(-M-l) 

•  •  • 

fU-M-L) 

1 

R(l) 

• 

R(0)  ’  *  •  . 

•  • 

•  • 

• 

R(-M) 

• 

-a(l,M) 

• 

•  • 

• 

• 

• 

• 

• 

•  • 

•  • 

• 

• 

• 

• 

• 

•  • 

• 

• 

» 

• 

« 

R(-M-l) 

R(M) 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

R(-M) 

-ct(M,M) 

R(M+1) 

• 

• 

•  • 

R(M) 

• 

• 

• 

•  • 

• 

• 

• 

• 

• 

• 

• 

• 

•  • 

•  • 

• 

• 

• 

• 

•  • 

■ 

• 

• 

• 

• 

•  • 

0 

• 

= 

• 

R(M+L) 

•  • 

• 

•  • 

• 

• 

...  R(ttfl) 

• 

• 

R(M) 

• 

• 

• 

•  •  •  •  •  » 

•  • 

R(0) 

• 

'r(I) 

*  R(-l) 

R(0) 

0 

(2.46) 


where  we  have  assumed  that  L  values  of  the  extrapolated  autocorrelation 
function  have  been  Included  and  therefore  that  L  zeroes  have  been  appended  to 
augment  the  length  of  the  PEF.  Of  course,  if  L=0,  then  eqn.  (2.46)  reduces 
to  eqn.  (2.30). 


In  order  to  simplify  the  notation,  we  shall  now  denote  the  (M+L+L)  x 
(M+l+L)  autocorrelation  matrix  by  [R(M+L)].  We  want  now  to  determine  its 
inverse,  [R(M+L)]~^.  This  we  can  do  by  noting  that 


1  0  ...  ...  ...  ...  ...  ...  ...  0 
-a,H)  „  : 

-a(2,M)  -a(l,M)  *  .  | 

::**i 


[R(M4L)] 


-a(M,M)  -o(M-l,M)  -a(l,M) 


0  -a (M,M) 


0 


-o(l,M-l) 


L 


0 


I  -a (M-1,M) 

e 

0  ...  -a(M,M) 


-a (M-l ,M-1) 


1  0| 

-o(l,l)  1 


(M+l+L) x (M+l+L) 
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(2.47) 


(M+l+L)  x  (MH+L) 


The  (M+l+L)  x  (M+l+L)  matrix  of  PEFs  and  augmented  PEFs  on  the  left-hand 
side  of  eqn.  (2.47)  we  shall  denote  by  [a(M+L)].  The  explicit  contents 
above  the  diagonal  of  the  matrix  with  the  Ps  along  its  main  diagonal,  denoted 
by  the  large  asterisk,  are  not  required  in  what  follows  below. 

If  we  now  let  [a(M+L)]+  denote  the  conjugate  transpose  of  [a (M+L) ]  we 
can  write 


[a(M+L)]+[R(M+L)][ct(M+L)]  =■  [P(M+L)] 


where 


(2.48) 


(2.49) 


The  reason  that  [P(M+L)]  is  diagonal  is  that  both  [a (M+L)]  and  the  right- 
hand  side  of  eqn.  (2.47)  are  upper-triangular  matrices,  and  therefore  their 
product  must  also  be  upper-triangular.  However,  the  left-hand  side  of  eqn. 
(2.48)  is  Hermitian,  so  that  the  upper-triangular  product  matrix  must  also 
be  Hermitian,  which  can  only  be  true  if  the  off-diagonal  elements  are 
zeroes. 


The  inverse  of  the  diagonal  matrix  [P(M+L)]  we  can  write  by  inspection: 


[P(M+L>] 


() 


(2.50) 


We  now  find  that  a  few  simple  manipulations  of  eqn.  (2.48)  will  lead  us 
to  the  desired  result.  For  brevity  we  will  drop  the  explicit  dependence  on 
the  argument  (M+L) .  Post-multiplying  eqn.  (2.48)  by  [P]-^  yields 


[<*]+[R]  [a]  [P]”1  =  [I] 


(2.51) 


where  [I]  denotes  [I (M+L)],  the  (M+l+L)  x  (M+l+L)  identity  matrix.  Pre¬ 
multiplying  both  sides  of  eqn.  (2.51)  by  [a]  and  post-multiplying  by  [a]+ 
yields 

[a][a]+[R][a][P]"1[a]+  -  [a][a]+  (2.52) 

Finally,  pre-mult iplying  both  sides  of  eqn.  (2.52),  first  by  [a]  ^  and  then 
by  [a]^— 1  (since  [a]  is  clearly  non-singular)  yields  the  desired  result: 

[R][a][Pf1[a]+  -  [I]  (2.53) 


or 

[ R (M+L)] -1  -  [a(M+L)][P(M+L)]_1[a(M+L)]+  (2.54) 

Thus  we  see  that,  given  a  set  of  M  reflection  coefficients  and  PQ,  the  zero- 
lag  value  of  the  autocorrelation  function,  the  inverse  of  the  maximum-entropy 
extrapolation  of  the  autocorrelation  matrix  can  be  computed.  Further  examina¬ 
tion  of  the  inverse  of  the  Toeplitz  autocorrelation  matrix  would  show  that 
this  inverse  matrix  is  both  persymmetric  (i.e.,  symmetric  about  the  diagonal 
from  its  lower-left  to  upper-right  corner)  and  Hermitian,  which  means  that 
the  (M+l+L)  x  (M+l+L)  matrix  has  at  most  (M+l+L) (M+3+L) /4  independent  elements 
if  (M+l+L)  is  even,  and  (M+2+L)2/4  independent  elements  if  (M+l+L)  is  odd. 

This  observation  is  useful  in  simplifying  the  implementation  of  eqn.  (2.54) 
to  derive  the  actual  elements  of  the  inverse  autocorrelation  matrix. 

For  the  particular  case  L=0  (i.e.,  the  autocorrelation  matrix  is  not 
extrapolated  before  inversion),  it  is  straightforward  to  derive  the  equations 
for  the  elements  of  the  inverse  autocorrelation  matrix  [7],  If  we  let  R(i,j) 
denote  an  element  of  this  matrix,  then 
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R(i,j)  -  R*(j,D  -  R(M-i.M-j)  =  R* (M-j  ,M-i) 


n 

*  Z  a (M-k , i-k) a* (M-k , j -k) / P  .  (2.55) 

k-0  M_k 


l,j  =  0,1,.. .,M 

where  n  ts  the  smaller  of  the  indices  i,j  and  we  let  a(M-k,0)  =  -1  as  in 
eqn.  (2.24). 


2.10  DERIVING  THE  AUTOCORRELATION  MATRIX  FROM  A  GIVEN  SET  OF  REFLECTION 
COEFFICIENTS 


In  Section  3  et  seq.  we  shall  derive  the  Burg  algorithm,  which  allows 
us  to  compute  a  set  of  reflection  coefficients  and  an  estimate  of  PD  directly 
from  a  set  of  amplitude  time-series  data.  We  might  then  want  to  estimate  the 
autocorrelation  function  from  the  estimated  reflection  coefficients.  There¬ 
fore  it  is  useful  to  derive  the  equations  for  computing  the  autocorrelation 
function  from  a  given  set  of  reflection  coefficients. 

We  can  easily  obtain  the  required  relations  from  eqns.  (2.27a)  and 
(2.27b),  From  eqn.  (2.27a)  we  have 

R(0)  -  P  (2.56) 

o 


and  from  eqn.  (2.27b)  we  have 

K 

R(K)  =  Z  a (k,K)R(K-k)  (2.57) 

k-1 

K  =  1,2, ... ,M 


where  the  a(k,K)s  are  derived  from  the  set  of  reflection  coefficients  by 
using  eqn.  (2.43). 


2.11  SUMMARY 

We  have  shown  how,  given  a  set  of  perfect  autocorrelation  data,  a 
power  spectrum  which  has  maximum  entropy  in  the  sense  of  Shannon's  Informa¬ 
tion  Theory  can  be  estimated.  We  have  shown  that  this  power  spectrum  is 
closely  related  to  a  unique  set  of  prediction-error  filters  which  can  be 
computed  from  the  data,  and  we  have  derived  the  algorithm  for  computing 
these  prediction-error  filters.  We  have  also  shown  how  to  extrapolate  the 
autocorrelation  data  using  the  maximum-entropy  criterion,  how  to  invert  the 
extrapolated  or  unextrapolated  autocorrelation  matrix,  and  how  to  reconstitute 
the  autocorrelation  data  if  a  set  of  reflection  coefficients  and  the  value 
of  the  zero-log  autocorrelation  is  given  instead. 

We  have  not  considered  the  effects  of  statistical  fluctuations  in  the 
autocorrelation  data,  which  will  exist  in  any  set  of  measured  autocorrelation 
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data.  This  rather  complicated  problem  has  been  considered  by  Baggeroer  [2] 
who  found  that,  in  general,  a  maximum-entropy  spectral  estimate  based  on  a 
given  data  set  will  have  both  a  more  "spikey"  spectrum  and  a  greater  varia¬ 
bility  than  a  Four ier-transform  spectral  estimate  based  on  the  same  data. 


3.  THE  BURG  ALGORITHM  FOR  DETERMINING  AUTOREGRESSIVE  SPECTRAL  ESTIMATES 

FROM  THE  TIME-SERIES  DATA 

We  saw  in  Section  2  how,  given  a  set  of  autocorrelation  data,  we  could 
find  an  estimation  of  the  power  spectrum  which  had  maximum  entropy.  It  turned 
out  that  the  maximum-entropy  spectral  estimate  was  inversely  proportional  to 
the  frequency  response  of  a  prediction-error  filter  (PEF)  which  was  derivable 
from  the  autocorrelation  data. 

In  the  radar  situation,  we  are  often  faced  with  having  available  only 
relatively  short  sets  of  amplitude  time-series  data.  Sometimes  we  want  to 
extract  the  maximum  amount  of  spectral  information  from  these  data,  and  do 
not  want  to  be  subject  to  the  effects  of  spectral  broadening  introduced  when 
conventional  Fourier-transform  techniques  are  used. 

The  Burg  algorithm  for  spectral  analysis  is  based  on  the  concept  of 
deriving  a  set  of  reflection  coefficients  and  hence  a  set  of  PEFs  directly 
from  the  amplitude  time-series  data.  The  logic  behind  the  various  steps  of 
the  algorithm  is  prinicpally  their  analogy  with  the  steps  taken  when  the 
autocorrelation  of  the  data  is  given. 

We  being  by  assuming  that  we  are  given  a  set  of  N  data  x(n):  n=0,l, 
2,...,N-1.  We  want  to  compute  a  set  of  PEFs  from  these  data,  and  by  analogy 
we  want  this  set  of  PEFs  to  have  the  same  properties  as  if  they  had  been 
derived  from  autocorrelation  data.  In  fact,  having  computed  such  a  set  of 
PEFs,  we  can  then  obtain  from  them  a  set  of  autocorrelation  estimates  by 
inverting  the  Levinson  recursion  scheme  as  shown  in  Section  2.9. 

These  autocorrelation  estimates  Rg(m)  would  not  be  the  same  as  those 
derived  by  taking  the  conventional  lagged  cross-product  average 


N-m 

R  (m)  =  S(m)  £  x*(n)x(n+m)  (3.1) 

C  n=0 


where  S(m)  is  a  scaling  factor.  (Compare  eqns.  (3.1)  and  (2.1).)  For  refer¬ 
ence  in  what  follows,  we  note  that  if  the  scaling  factor 


S(m)  =  1/N 

is  chosen,  that  Rc(m)  is  biased,  since  then 

<Rc(m)>  =  (~j  R(m) 


(3.2) 


(3.3) 


and  the  expected  value  of  Rc(m)  is  linearly  tapered  to  zero  at  m=N  lags.  If 
the  scaling  factor 
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S(m)  =  1/ (N-m)  (3.4) 

is  chosen,  then  Rc (ra)  is  unbiased,  since  then 

<R  (m)>  =  R(m)  (3.5) 

c  l N-m  / 

but  the  expected  variance  of  Rc(m)  for  successively  greater  values  of  m 
becomes  greater,  due  to  the  smaller  number  of  terms  in  each  successive 
summation. 


3.1  THE  RELATION  BETWEEN  MAXIMUM-ENTROPY  AND  A  MINIMUM  POWER  OR  MINIMUM-MEAN- 
SQUARE  CRITERION 


In  order  to  determine  a  criterion  which  can  be  used  to  estimate  a  PEF 
from  the  time-series  data,  it  is  useful  to  examine  the  properties  of  the 
maximum-entropy  PEFs  derived  from  known  autocorrelation  data.  For  compactness 
of  notation  we  let  the  Mth-order  PEF  be  denoted  by  the  column  vector  £(M)  where 


ct(M)  = 


1 

-a(l,M) 


(3.6) 


-a (M,M) 


and  the  conjugate  transpose  of  £(M)  by  £  (M)  where 

a+(M)  =  [1,-a* (1,M) . -a* (M,M) ]  (3.7) 

We  also  let  the  (M+l)  x  (M+l)  autocorrelation  matrix  be  denoted  by  [R(M)]  and 
note  that  it  is  equal  to  its  own  conjugate  transpose. 

Now  we  note  that 

a+(M)[R(M)J  =  [PM,0 . 0]  (3.8a) 

[R (M) ]  a  (M)  =  [PM,0 . 0]+  (3.8b) 

and  £+(M)[R(M)]£(M)  =  PM  (3.9) 

where  PM  is  the  output  power  as  before.  We  want  to  know  what  the  output 
power  would  be  if  any  other  PEF,  say  _8(M) ,  where 
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1 

-6(1, M) 

JJ(M)  =  .  (3.10) 

were  to  be  used  Instead.  Therefore  we  consider  the  expression 
(B.+(M)-a+(M))[R(M)](e.(M)-a(m))  -  J3+(M)  [R  (M)  ]  B  (M)  -  a+(M)  [R(M)  ]jg(M)  (3.11a) 

-  £+(M)[R(M)]a(M)  +  a+(M)[R(M)]a(M) 

-  8+(M)[R(M)]6(M)  -  PM“PM+PM  (3.ilb) 

«  e+(M)[R(M)]8(M)  -  PM  (3.11c) 

Therefore  we  find  that 

S+(M)[R(M)]B(M)  =  PM  +  (8+(M)-a+(M))[R(M)](_g.(M)-a(M))  (3.12) 

We  note  that  we  can  write,  for  any  column  vector  X_(M)  of  length  M+l, 

M  M 

y  (M)[R(M)]y(h'>  -  £  £  y*(k)R(k, 1) y(l)  (3.13) 

k-0  1=0 

where,  by  comparison  with,  say,  eqn.  (2.32) 

R(k,l)  =  R(k-l)  (3.14) 

Also,  from  eqn.  (2.1)  we  can  write 

R(k-l)  =  <x*(t)x(t+k-!)>  (3.15a) 

or  R(k-l)  =  <x*(t-k)x(t-l)>  (3.15b) 

or  R(k-l)  =  <x*(t+l)x(t+k)>  (3.15c) 


if  the  process  generating  x(t)  is  stationary.  Then  we  can  write,  using  eqn. 
(3.15b), 

MM 

£ (M)[R(M)1x(M)  =  £  £  Y*(k)<x*(r-k)x(t-t)>y(t)  (3.16a) 

k=0  1=0 

M  M 

=  <  £  y*(k)x*(t-k)  £  y (!)x(t-l)>  (3.16b) 

k-0  1-0 


25 


M 

i  Ya)x(t-Jt) 

1=0 


(3.16c) 


>_  0  (3. 16d) 

Applying  this  general  result  to  eqn.  (3.12)  we  can  see  that  any  PEF  other 
than  the  one  generated  from  the  autocorrelation  data  by  the  Levinson  recursion 
will  have  an  output  power  greater  than  PM-  From  this  Burg  inferred  that  a 
minimum  mean-square- error  criterion  is  appropriate  for  use  in  estimating  the 
Burg  PEF. 


3.2  JUSTIFICATION  FOR  SIMULTANEOUSLY  MINIMIZING  THE  FORWARD  AND  BACKWARD 
OUTPUT  POWER  FROM  PEF 


Going  back  to  eqn.  (3.16c),  w(  note  that  we  can  write 


a+(M)[R(M)]a(M) 


x(t) 


2 

M 

Z  ce(m,M)x(t-m)  > 
m=l 


(3.17) 


-  P 


M 


However,  in  eqn.  (3.1  a)  we  could  just  as  easily  have  used  eqn.  (3.15c)  for 
R(M),  to  have  gotten  instead  the  result 


a+(M)[R(M)]a(M) 


x(t) 


M 

Z  a*(m,M)x(t+m) 
m=l 


2 


> 


(3.18) 


M 


Comparing  eqns.  (3.17)  and  (3.18),  we  see  that  in  the  former  the  PEF  is 
applied  to  the  time  series  data  in  the  usual  forward  direction,  and  that  the 
expected  mean-square  output  is  equal  to  in  the  latter  equation,  the 

complex  conjugate  of  the  PEF  is  applied  to  the  data  in  the  reverse  direction, 
and  again  the  expected  mean-square  output  is  equal  to  P^.  From  this  observa¬ 
tion  Burg  inferred  that  one  ought  to  apply  the  Burg  PEF  to  the  data  in  both 
directions  simultaneously,  and  then  minimize  the  mean  of  the  forward  and 
backward  output  powers. 


3.3  COMPUTATION  OF  THE  BURG  PEFs 

The  zero-order  PEF  is,  by  definition,  the  constant  "one".  Thus  the 
output  energy  of  the  zero-order  PEF  is  estimated  by 

N-l  ■ 

En  -  Z  I x(n) | 2  (3.19) 

n«0 

so  that  Eq/N  is  an  unbiased  estimate  of  P0. 
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If  the  first-order  Burg  PEF  is  applied  to  the  data  in  the  forward 
direction,  without  going  beyond  the  ends  of  the  data-set,  we  get  a  sequence 
of  N-l  outputs,  denoted  by  f^(n),  where 

f  ^(n)  =■  x(n+l)  -  0(1,  l)x(n)  n=0, 1, . . . ,N-2  (3.20) 

Here  0(1,1)  is  the  Burg  estimate  of  the  first-order  reflection  coefficient. 
Similarly,  if  we  apply  the  conjugate  first-order  PEF  to  the  data  in  the 
backward  direction,  we  get  the  output  sequence 


b^ (n)  =  x(n)  -  8*(l,l)x(n+l)  n=0,l, . . . ,N-2 

Then  the  output  energy  of  the  first  order  PEF  is  estimated  by 

N-2 


E,  -  J  I  (|f,(n)|2  +  |b  (n) | 2) 
n-0 


(3.21) 


(3.22) 


Following  the  line  of  reasoning  given  above,  that  the  maximum-entropy 
PEF  has  minimum  output  energy,  we  want  to  estimate  0(1,1)  by  minimizing  E^ 
with  respect  to  0(1,1).  We  do  this  by  expanding  eqn.  (3.22),  taking  the 
first  derivative  with  respect  to  0(1,1),  and  equating  the  result  to  zero. 
The  result  is 


N-2 

Z  x*(n)x(n+l) 


| x(n+l) | 2)  (3.23) 

n=0 

We  notice  that  the  numerator  of  eqn.  (3.23)  looks  very  much  like  the  lagged 
cross-product  average  of  eqn.  (3,1)  for  a  lag  of  one  unit  (M=l)  and  S(m)=l, 
and  the  denominator  looks  something  like  the  zero-lag  cross-produce  of  eqn. 
(3.1),  except  that  the  terms  |x(0)|2  and  |x(N-l)|2  appear  only  once;  the 
other  N-2  terms  appear  twice.  We  also  notice  that  in  general 

|S(1,1)|<1  (3.24) 

which  is  a  consequence  of  the  well-known  Schwartz  Inequality. 

If  we  take  separately  the  expected  values  of  the  denominator  and 
numerator  of  eqn.  (3.23),  we  note  that 


6(1,1)  = 


nj0 


„-2 


( |  x(n)  | 


N-2 


<  Z  x*(n)x(n+l)> 

n=Q _  a  (N-l)R(l) 


<  \  z  (|x(n)j2  +  |x(n+l)|2)>  2  (N_1)2R(0) 

(3.25a) 

n=0 

=  R(1)/R(0) 

(3.25b) 

=  0(1,1) 

(3.25c) 

L 
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In  deriving  eqn.  (3.25a)  we  have  used  eqn.  (2.1),  and  we  have  used  eqns. 

(2.36)  and  (2.38)  with  K=0  in  going  from  eqn.  (3.25b)  to  eqn.  (3.25c).  Thus 
we  can  see  that  the  first-order  Burg  reflection  coefficient  approximates  the 
true  maximum-entropy  reflection  coefficient  if  the  expected  values  are  taken 
in  the  manner  shown.  In  general,  however,  this  procedure  is  unjustifiable, 
and  <8(1, 1)>  4  o(l,l). 

We  progress  now  to  the  estimation  of  the  second-order  Burg  reflection 
coefficient.  Again  we  apply  the  Burg  PEF  to  the  data  in  both  the  forward  and 
backward  directions,  without  going  beyond  the  ends  of  the  data-set.  Then  we 
estimate  the  mean  of  the  forward  and  backward  output  energies,  and  minimize 
this  energy  by  adjusting  only  the  second-order  reflection  coefficient. 

Applying  the  second-order  Burg  PEF  in  the  forward  direction,  we  obtain 
a  sequence  of  N-2  outputs,  denoted  by  f2(n),  where 

f2(n)  =  x(n+2)  -  0(1 ,2)x(n+l)  -  B(2,2)x(n) 

n=0, 1 . N-3  (3.26) 

Similarly,  applying  the  conjugate  second-order  Burg  PEF  in  the  backward 
direction,  we  obtain  a  sequence  of  N-2  outputs,  denoted  by  b2(n),  where 

b2(n)  =  x(n)  -  8*(l,2)x(n+l)  -  8* (2 , 2)x (n+2) 

n=0,l,...,N-3  (3.27) 

We  notice  (for  example)  that  the  first  (n=0)  term  of  each  of  the  sequences 
f2(n)  and  b2 (n)  is  dependent  on  the  first  (n=0) ,  second  (n=l)  and  third  (n=2) 
terms  of  the  original  data  sequence,  and  that  there  are  two  PEF  parameters, 
8(1,2)  and  8(2,2),  to  be  determined. 

We  recall  now  that  we  want  to  make  the  Burg  PEF  as  analogous  as  possible 
to  the  maximum  entropy  PEF,  If  this  is  to  be  the  case,  then  we  can  use  eqn. 
(2.43)  to  express  8(1,2)  in  terms  of  8(1,1)  and  8(2,2): 

8(1,2)  =  8(1,1)  -  8(2,2)S*(1,1)  (3.28) 

On  a  physical  basis,  it  can  be  considered  that  8(1,1)  contains  all  the 
information  possible  concerning  how  much  of  the  signal  can  be  predicted  on 
the  basis  by  considering  the  data  two  at  a  time.  Therefore  any  additional 
predictability  of  the  signal  which  can  be  discovered  on  the  basis  of 
examining  the  data  three  at  a  time,  must  be  expressable  as  a  function  of 
8(2,2)  only.  However,  we  also  note  that  this  restriction  can  mean  that  the 
absolute  minimum  output  energy  achievable  if  both  8(1,2)  and  8(2,2)  were 
simultaneously  adjusted  may  not  be  realized.  This  is  the  price  paid  for 
forcing  a  Toeplitz  structure  on  the  estimated  autocorrelation  matrix.  In 
general,  sample  autocorrelation  matrices  derived  from  equi-spaced  sampled 
data  are  Hermitian,  but  are  not  Toeplitz  [11]. 

By  substituting  eqn.  (3.28)  into  eqn.  (3.26)  and  using  eqns.  (3.20) 
and  (3.21),  we  find  that  f2(n)  is  expressable  in  terms  of  f^(n)  and  b^(n): 
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f2(n)  -  x(n+2)  -  [6(1,1)  -  B(2,2)6*(l,l)]x(n+1)  -  B(2,2)x(n)  (3.29a) 

-  [x(n+2)  -  6(1, l)x(n+l) ]  -  6(2,2)[x(n)  -  6*(l,l)x(n+l) ]  (3.29b) 

=  f^n+1)  -  6(2,2)b1(n)  (3.29c) 

n=0, 1 . N-3 

Similarly,  we  find 

b2(n)  =  x(n)  -  [6* (1,1)  -  6*(2, 2)6(1, l)]x(n+l)  -  6* (2, 2)x(n+2)  (3.30a) 

=  [x(n)  -  8*(1, l)x(n+l) ]  -  6*(2,2) [x(n+2)  -  6(l,l)x(n+l) ]  (3.30b) 
=  bx(n)  -  6*(2,2)f1(n+l)  (3.30c) 

n*0, 1, . . . ,N-3 

For  the  second-order  Burg  PEF,  the  output  energy  E2  is  given  by 


E,  -  7  Z  (|f9(n)|2  +  | b  (n) | 2)  (3.31) 

n=0  1 

Following  the  same  procedure  that  led  to  eqn.  (3.23),  we  find  that  6(2,2)  is 
given  by 

N-3 

I  b*(n)f1(n+l) 

8(2,2) - ^3-""° -  (3.32) 

7  Z  (|b1(n)|2+  |f1(n+l)|2) 
n=0 

Comparing  eqns.  (3.32)  and  (3,23)  we  note  the  structural  similarity.  In  fact, 
if  we  let 

bQ(n)  =  x(n)  =  fQ  (n)  n=0, 1, . . . ,N-1  (3.33) 

we  can  rewrite  eqn.  (3.23)  as 

N-2 

I  b*(n)fQ(n+l) 

6(1,1)  =  -N_2  n=° -  (3.34) 

|  1  ( | b  (n) | 2  +  |f  (n+l)|2) 


In  general,  we  find  that 
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0(K,K) 


N-l-K 

En  btl(n)fK-l(n+1) 

_ n=0 _ _ 

7  Z  ^  bK-l I  2  +  I fK-l ^n+1) I 2) 


K=1,2,...,N-1 


(3.35) 


and 

fK(n)  =  fK_1(n+l)  -  eOC.lOb^Cn)  (3.36) 

bK(n)  -  bK_x(n)  -  8*  (K ,  K) f (n+1 )  (3.37) 


n=0,l, . . . , N-l-K 

Equation  (3.35)  tells  us  that  successively  higher  orders  of  the  Burg  PEF 
reflection  coefficients  can  be  calculated  by  successively  computing  the 
single-lag  cross-correlation  between  the  forward  and  backward  output  signals 
from  the  immediately  lower  order  PEF  and  normalizing  by  the  mean  energy  of 
those  two  output  signals,  excepting  that,  in  calculating  this  mean,  the  last 
backward  output  and  the  first  forward  output  are  omitted.  Applying  Schwartz's 
inequality  to  eqn.  (3.35)  again  tells  us  that,  in  general, 

j  0  (K,K)  j  _<  1  (3.38) 

which  conforms  with  the  restriction  (2,41)  on  ja(K,K)j  for  the  MEM  estimate. 

th 

Finally,  then,  we  have  derived  the  K  order  Burg  spectral  estimate 


5k(£) 


K 

1  -  l 
k=l 


8(k,K) exp (- j  2rkf ) 


(3.39) 


where,  by  analogy  with  eqn.  (2.40),  the  output  powers  Jljr  are  defined  as 

n  =  E  /N  K=0  (3.40) 


and 


nK  =  nK-l  (1  “  lB(K’K)l2) 


(3.41) 


K=1,2,3,...,N-1 


and  by  analogy  with  eqns.  (2.43)  and  (3.28) 


8(k,K) 


B(k,K-l)  -  B(K,K)  0* (K-k,K-l) 


k-l,2,...,K-l 
K-1,2,3,.  ..,N-1 


(3.42) 
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3.4  BIAS  OF  THE  BURG  ESTIMATE  OF  THE  AUTOCORRELATION  FUNCTION 

We  saw  from  eqns.  (3.25a-c)  that  the  Burg  estimate  of  the  first-order 
reflection  coefficient  is  roughly  equivalent  to  the  true  maximum-entropy 
estimate.  Unfortunately,  however,  the  Burg  estimate  is  not  in  general 
uribiased. 


The  existence  of  bias  in  the  Burg  algorithm  can  be  demonstrated  by 
considering  the  following  special  case  [11].  From  eqn,  (2.57),  for  K“1  we 
can  write 

R(l)  ■*  a(l,l)P  (3.43) 

o 

where 


P  -  R(0) 
o 


By  analogy,  we  can  also  write 


R(l)  -  0(1,1)11 

D  O 


(3.44) 


(3.45) 


where  Rg(l)  can  be  taken  as  the  Burg  estimate  of  the  first  lag  of  the  auto¬ 
correlation  function.  If  we  substitute  eqns.  (3.23)  and  (3.19)  for  N=3  into 
eqn.  (3.45)  then  we  get 


rb(D 


x* (Q)x(l)  +  x* (l)x(2) 


x(0) I  2  +  lx(l)l2  +  1 x(2) 


j  [|x(0) 


+  2|x(l)|2  +  | x (2) | 2] 


(3.46) 

We  can  see  that  <Rfi(l)>  depends  on  more  than  just  <x*(0)x(l)>  =  <x*(l)x(2)>  = 
R(l) ;  it  depends  on  the  joint  probability  distribution  of  (x(0),x(l)). 

Following  the  example  [11],  we  let  the  data  be  real,  and  we  let 

x(0)=  u  (3.47a) 

x(l)  =  —  (u+v)  (3.47b) 

and  x(2)  =■  v  (3.47c) 

where  u  and  v  are  independent,  zero-mean,  unit-variance  random  variables  with 
Gaussian  distribution.  Then  we  have 

<x(0)x(l)>  =  <x(l)x(2)>  -  —  (3.48a) 

Jl 


and 


>*x(0)x(2)>  »  0 


(3.48b) 
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By  substituting  eqns.  (3.47a-c)  into  eqn.  (3.46),  we  obtain 

1_  1  (u-Hv)2(3u2+2uv+3v2) 


*b(1> 


Jl  6  u2  +  uv  +  v2 


(3.49) 


so  that 


<RBa> 


OO 

>  -  JL  I  i_  f  f  dudv  exp  (- 

Jl  6  2n  JJ  \  2  /  u2+uv+v2 


(3.30a) 


1®  2n 

dr  r3  «p(-r 3/2)  (  de 

J  C^CS+S2 


(3.50b) 


where  we  have  changed  to  polar  coordinates  and  let  C  =  cos0  and  S  =  sin6. 

The  integral  over  r  in  eqn.  (3.50b)  is  equal  to  2  and  the  integral  over  6  is 
equal  to  4ir(2-l/i^3)  from  [11].  Therefore 

<R(1)  >  =  —  — ~  —  =  —  x  0.9484  (3.51) 

B  Jl  9  Jl 

which  is  not  equal  to 

R(l)  -  <x(0)x(l)>  =  <x(l)x(2)>  =  ~  (3.52) 

Jl 

Thus  we  see  that  the  Burg  estimate  of  the  first  lag  of  the  autocorrela¬ 
tion  is  biased;  but,  more  important,  we  see  that  the  bias  is  inherently 
dependent  on  both  the  specific  statistical  distribution  describing  the  data 
and  the  spectral  characteristics  of  the  data  as  described  by  the  true  auto¬ 
correlation  function.  It  is  this  phenomenon  that  makes  theoretical  studies 
of  the  statistical  behavior  of  the  Burg  algorithm  extremely  difficult. 


3.5  ESTIMATING  THE  APPROPRIATE  ORDER  OF  THE  BURG  SPECTRAL  ESTIMATOR 

We  saw  in  Section  2.7,  in  the  discussion  following  eqn.  (2.36),  that 
the  parameter  is  the  arithmetic  difference  between  successive  maximum- 
entropy  estimates  of  the  autocorrelation  functions  and  the  given  autocorrela¬ 
tion  data  when  such  autocorrelation  data  are  known  and  available.  In  parti¬ 
cular,  we  saw  that  Q^,  the  difference  between  the  extrapolated  value  of  the 
K+lc^  autocorrelation  sample  and  the  actual  value  of  the  K+lth  autocorrela¬ 
tion  sample,  goes  to  zero  if  the  process  generating  the  data  can  be  perfectly 
modelled  by  a  feedback  network  containing  exactly  K  feedback  loops,  being 
excited  by  an  input  signal  consisting  of  white  noise.  Thus,  for  this  very 
special  case,  we  have  a  perfect  indicator  for  determining  the  correct  order 
of  PEF  required  to  model  the  process. 

We  also  noted  in  Section  2.7  that  a  consequence  of  having  QK=0  was  that 

the  K+lch  reflection  coefficient  became  zero,  so  that  P  ■  P  and  the  out- 

K+l  K. 


put  power  of  the  PEF  decreased  no  further  as  attempts  to  derive  higher-order 
PEFs  were  made.  This  observation  might  lead  us  to  attempt  to  use  some  similar 
criterion  to  determine  the  most  appropriate  order  of  PEF  with  which  to  model 
the  signal-generating  process.  However,  it  is  clear  that,  even  if  the  signal¬ 
generating  process  were  truly  autoregressive  and  of  order  M,  in  general  the 
statistical  fluctuations  which  normally  occur  in  finite-length  data  records 
would  be  sufficient  to  prevent  8(M+1,M+1),  the  Burg  reflection 

coefficient,  from  vanishing. 

There  have  been  three  criteria  proposed  for  estimating  the  appropriate 
Older  M  of  the  Burg  PEF  used  to  make  a  spectral  estimate  from  a  given  set  of 
amplitude  time-series  data.  We  will  not  attempt  a  detailed  derivation  of  any 
of  them  here,  but  simply  reference  their  sources. 

The  first,  and  most  commonly  used  criterion,  is  known  as  Axaike's 
"final  prediction  error"  (FPE)  criterion  [12],  and  is  defined  as 


FPE(M)  =  H 


P+M) 

M(N-M( 


(3.53) 


where  7.^  is  defined  by  eqns.  (3.41)  and  (3.42),  N  is  the  number  of  time-series 
data,  and  M  is  the  order  of  PEF  currently  under  consideration.  The  term  M 
is  replaced  by  (M+l)  in  both  the  numerator  and  the  denominator  if  the  mean 
has  been  estimated  and  subtracted  from  the  data.  Heuristically,  we  can  note 
that  7m  is  a  monotonically  decreasing  function  of  M,  reflecting  the  fact  that, 
in  general,  the  more  parameters  we  fit  to  the  data,  the  better  the  fit 
achieved.  Conversely,  the  factor  in  the  braces  is  a  monotonically  increasing 
function  of  M,  reflecting  the  fact  that  the  more  parameters  fitted  to  the 
data,  the  less  certain  is  the  statistical  reliability  of  each  successively 
higher-order  PEF.  Statistical  fluctuations  can  arise  causing  local  minima 
in  the  FPE  estimates  from  a  given  set  of  data;  it  is  usual  to  choose  that 
value  for  M  which  gives  the  global  minimum  for  the  FPE.  The  FPE  was  derived 
as  a  measure  of  the  mean-square  error  to  be  expected  when  a  PEF  derived  from 
one  set  of  time-series  data  is  applied  to  an  independent  set  of  time-series 
data  obtained  from  the  same  (assumed  stationary)  source. 

A  second  criterion,  also  due  to  Akaike,  is  called  "an  information- 
theoretic  criterion"  (AIC)  [13]  and  is  given  in  Ref.  [14]  as 


AIC(M)  =  log  n„  +  2M/N 
e  M 


(3.54) 


Again  the  value  of  M  corresponding  to  the  global  minimum  of  AIC(M)  is  to  be 
chosen.  The  AIC  is  a  measure  of  the  negative  of  the  log-likelihood  ratio  of 
the  estimate  %  the  power  of  the  white-noise  excitation  of  the  assumed- 
autoregressive  signal-generating  process,  expressed  as  a  function  of  the 
order  of  the  fitted  PEF.  The  minimum  AIC  occurs  on  the  average  for  the  most 
likely  to  be  appropriate  value  for  M,  the  order  of  the  assumed-autoregressive 
generating  process. 


Parzen  [15]  has  defined  what  he  calls  a  "criterion  autoregressive 
transfer  function"  (CAT)  as  a  measure  of  the  mean-square  error  between  the 
"true"  PEF  of  unknown,  possible  infinite  length,  and  the  estimated  PEF.  This 
can  be  done  without  explicit  knowledge  of  the  "true"  PEF,  and  the  CAT(M)  is 
defined  as 
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CAT  (M) 


1_  v  SlE  Mdi 
N  .Nil  ‘  NH 
m=l  m  m 


(3.55) 


Again,  the  value  of  M  corresponding  to  the  global  minimum  of  CAT(M)  is  to  be 
chosen. 


These  criteria  are  discussed  more  fully  in  Ref.  [14];  in  particular, 
simulation  results  are  presented  which  indicate  that  for  many  applications, 
the  simplest  criterion,  the  FPE,  is  the  most  satisfactory. 


4.  A  BRIEF  SUMMARY  FOR  OTHER  TECHNIQUES  FOR  DETERMINING  AUTOREGRESSIVE 
SPECTRAL  ESTIMATES  TROM  TIME-SERIES  DATA 

There  are  several  other  techniques  for  making  autoregressive  spectral 
estimates  from  time-series  data,  but  none  of  them  are  equivalent  to  the 
maximum-entropy  method  in  the  same  sense  as  the  Burg  algorithm.  These  other 
methods  can  be  divided  into  two  classes.  One  class  is  typified  by  the  Yule- 
Walker  technique,  wherein  biased  or  unbiased  estimates  of  the  autocorrelation 
function  are  first  made  from  the  time-series  data  by  means  of  eqn.  (3.1),  and 
then  a  PEF  is  estimated  by  solving  the  same  form  of  matrix  equation  as  is 
solved  for  the  maximum-entropy  PEF  when  the  autocorrelation  data  are  known 
exactly.  The  other  class  is  typified  as  an  unconstrained  minimization 
procedure  [16]  wherein  for  each  order  of  PEF,  the  output  power  is  minimized 
with  respect  to  all  of  the  PEF  coefficients  simultaneously  (except  for  the 
leading  term) . 


4.1  THE  YULE-WALKER  TECHNIQUE 

The  Yule-Walker  technique  exacts  one  of  two  penalties  in  its  use, 
depending  on  whether  biased  or  unbiased  estimates  of  the  autocorrelation  are 
used.  If  unbiased  estimates  are  used,  then  loss  of  spectral  resolution  can 
be  expected  because  of  the  windowing  effect  described  in  Section  1.3.  If 
biased  estimates  are  used,  then  it  is  possible  that  a  PEF  that  does  not  have 
all  its  zeroes  inside  the  unit  circle  (see  Section  2.7)  can  be  designed.  In 
this  case  the  extrapolated  autocorrelation  function  derived  using  eqn.  (2.45) 
would  diverge  rather  than  converge  to  zero  as  the  lag  tended  to  infinity.  In 
terms  of  effects  on  the  estimated  spectrum,  it  is  not  clear  if  there  are  any 
that  are  significant. 

An  apparent  disadvantage  of  the  Yule-Walker  techniques  is  that  they 
require  prior  computation  of  autocorrelation  estimates  before  they  are 
executed.  Of  course  if  there  is  an  excess  of  data,  it  may  be  possible  to 
compact  the  data  by  computing  the  autocorrelation  cross-product  sums  of  eqn. 
(3.1)  as  the  data  are  being  collected,  so  that  the  apparent  disadvantage 
becomes  in  fact  an  advantage.  It  seems  unlikely  that  this  would  ever  be  the 
situation  in  a  radar  application. 


4.2  THE  UNCONSTRAINED  MINIMIZATION  TECHNIQUE 


As  mentioned  previously,  the  unconstrained  minimization  technique 
requires  that  for  each  order  of  PEF  desired,  the  minimization  procedure  must 
be  reapplied  to  the  amplitude  time-series  data.  This  means  that  the 
recursive  efficiency  of  the  Burg  algorithm  is  lost.  The  output  power  which  is 
minimized  can  be  either  the  forward-output  power,  where  the  filter  is  applied 
to  the  data  in  the  forward  direction,  or  the  reverse  output  power,  where  the 
conjugate  filter  is  applied  in  the  reverse  direction;  or  some  combination  of 
these  powers,  such  as  the  average.  It  has  been  reported  on  the  basis  of 
simulation  studies  [16]  that  minimizing  the  average  power  appears  to  give  more 
accurate  spectral  estimates,  as  opposed  to  minimizing  either  the  forward  or 
reverse  power  only. 

A  major  computational  disadvantage  of  the  unconstrained  minimization 
techniques  is  that  they  require  th*  inversion  of  KxK  matrix,  where  K  is  the 
order  of  the  PEF  being  estimated.  The  matrix  has  some  symmetry  properties 
which  can  be  used  to  simplify  the  inversion  process  [17],  but  the  matrix  does 
not  have  Toeplitz  symmetry  so  that  the  relatively  straightforward  Levinson 
recursion  technique  is  not  applicable. 


5.  APPLICABILITY  OF  THE  BURG  ALGORITHM  TO  RADAR  SIGNAL  PROCESSING 

In  the  time-domain  processing  of  radar-ec.io  data  there  appear  to  be 
three  potential  applications  for  the  Burg  algorithm:  clutter  spectral 
estimation  and  classification,  Doppler-shifted  target  detection  and  classifi¬ 
cation,  and  inverse  clutter-correlation  matrix  estimation.  In  the  spatial- 
domain  processing  of  radar  data  from  antenna  arrays,  there  appears  to  be 
potential  application  for  enhancing  angular  resolution  somewhat  beyond  that 
given  by  conventional  beamforming  algorithms.  As  well,  estimation  of  the 
inverse  of  the  apertui  -correlation  matrix  is  possible.  This  last  topic,  which 
is  relevant  to  sidelobe  cancellation,  is  not  discussed  further  here. 

In  order  to  assess  the  feasibility  and  performance  of  the  algorithm 
for  these  applications,  the  questions  of  complexity  of  implementation, 
statistical  properties  and  biases  must  be  considered.  Since  the  first 
question  is  the  most  tractable,  it  will  be  examined  first. 


5.1  COMPUTATIONAL  COMPLEXITY  OF  THE  BURG  ALGORITHM 

5.1.1  Numbers  of  Arithmetic  Operations  Required  to  Compute  the  PEF 

From  eqn.  (3.35)  it  is  apparent  that  when  the  (N-K+l)  fK_^(n)s  and 
b^_i(n)s  are  available,  then  (N-K)  complex  multiplications  are  required  to 
compute  the  numerator  of  8(K,K)  and  3 (N-K)  real  additions  and  4 (N-K)  real 
multiplications  are  required  to  compute  the  denominator.  Since  a  complex 
multiplication  comprises  2  real  additions  and  4  real  multiplications,  then 
we  can  see  that  it  requires  5 (N-K)  real  additions,  8(N-K)+1  real  multiplica¬ 
tions  and  2  real  divisions  to  evaluate  eqn,  (3.35) 
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For  K>1,  it  is  also  necessary  to  evaluate  eqns.  (3.36)  and  (3.37), 
which  each  require  (N-K)  complex  additions  and  (N-K)  complex  multiplications 
or  4(N-K)  real  additions  and  4 (N-K)  real  multiplications.  Therefore,  to 
compute  the  sequence  of  reflection  coefficients  £(K,K),  K=1,2,...,M  necessary 
to  compute  the  M^-order  Burg  PEF  requires  {N[13M-8]-M[13(M+l)/2-8] }  real 
additions,  {N[16M-2]-M[8M-l] }  real  multiplications  and  2M  real  divisions. 

In  order  to  compute  spectra  from  the  sequence  of  M  reflection  coeffici¬ 
ents,  it  is  necessary  to  compute  from  them  the  M^-order  PEF  by  means  of 
eqn.  (2.42).  It  is  straightforward  to  show  that  it  requires  2M(M-1)  each  of 
real  additions  and  real  multiplications  to  compute  this  PEF. 

After  the  PEF  has  been  evaluated,  it  is  possible  to  obtain  unsealed 
spectral  estimates  by  arbitrarily  setting  nK**l  for  K-M  in  eqn.  (3.39). 
However,  if  it  is  required  that  the  spectral  power  be  accurately  represented 
in  the  9ense  that 

0.5 

Vf)df  “  no  <5a) 

•  -0.5 


then  it  is  necessary  that  be  computed  from  ~q  at  the  cost  of  2M  real 
additions  and  3M  real  multiplications.  As  well,  an  additional  (2N-1)  real 
additions,  (2N+1)  real  multiplications  and  one  real  division  are  required  to 
evaluate  Hq . 

In  total,  then  it  requires  at  most  [N(13M-6)-3M(3M-l) / 2-1]  real 
additions,  [16NM-2M(3M-1)+1]  real  multiplications  and  (2M+I)  real  divisions 
to  evaluate  all  the  parameters  required  to  specify  the  Mth-order  Burg 
spectral  estimator.  Some  gains  in  efficiency  can  be  realized  by  eliminating 
redundancies  in  computing  the  denominator  of  eqn.  (3.35)  [18];  taking  these 
improvements  into  account  reduces  the  numbers  of  computations  to  {[N(9M-2)~ 
M(5M-17) /2]-10}  real  additions,  [4N(3M+l)-M(4M-9)-8]  real  multiplications, 
and  (2M+1)  real  divisions.  In  practice,  it  would  turn  out  that  many  of  these 
operations  could  be  done  in  parallel.  The  numbers  of  operation.-  required 
for  representative  values  of  N  and  M  are  given  in  Tables  5.1  and  5.2. 

5.1.2  Dynamic  Range  Considerations 

After  the  PEF  of  order  M  and  the  scale  constant  have  been  evaluated, 
it  is  necessary  to  evaluate  eqn.  (3.39),  which  is  a  function  of  the  continuous 
variable  f,  the  normalized  frequency.  Since  the  spectrum  is  a  true  power 
spectral  density  estimate,  it  is  possible  for  there  to  occur  very  high  spikes 
of  very  narrow  spectral  width,  corresponding  to  a  large  fraction  of  the  total 
signal  power,  whenever  there  is  a  highly  coherent  (e.g.,  sinusoidal)  component 
present  with  high  SNR  in  the  data.  This  implies  that  if  narrow-band  signals 
are  being  sought,  then  "closely-spaced"  evaluations  of  the  power  density 
estimate  must  be  made  if  such  spectral  lines  are  not  to  be  missed. 

If  we  again  examine  the  denominator  of  eqn.  (3.39)  we  note  that  it  has 
the  form  of  a  discrete  Fourier  transform,  and  that  peaks  in  the  spectral 
estimate  correspond  to  minima  in  this  Fourier  transform.  Since  we  are  most 
interested  in  the  smallest  values  of  this  denominator,  it  appears  that  we 
may  well  be  faced  with  a  dynamic  range  problem  in  these  regions. 
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TABLE  5. 1 


Number  of  Real  Additions  Required  to  Derive  an  -order  Burg  PEF  from  N  Complex  Data 


sM 

N  N. 

1 

2 

4 

8  1 

!  52 

125 

256 

16  i 

108 

253 

528 

32 

220 

509 

1072 

64 

444 

1021 

2160 

128  i 

892 

2045 

4336 

256 

1788 

4093 

8188 

512 

3580 

8189 

17392 

8 

16 

32 

— 

-  -  ..  . 

_ 

1018 

_ 

- 

2138 

4030 

- 

4378 

8574 

16006 

8858 

17662 

34310 

17818 

35838 

70918 

35738 

72190 

144134 

TABLE  5.2 

Number  of  Rea /  Multiplications  Required  to  Derive  an  M^ -order  Burg  PEF  from  N  Complex  Data 


sM 

1 

2 

4 

N  \ 

8  ' 

125 

218 

380 

16  ! 

253 

442 

796 

32  1 

1 

509 

890 

1628 

64 

1021 

1786 

3292 

128 

2045 

3578 

6620 

256  ! 

4093 

7172 

13276 

512 

8189 

14330 

26588 

8 

16 

32 

1408 

- 

_ 

3008 

5384 

- 

6208 

11656 

21016 

12608 

24200 

45848 

25408 

49288 

95512 

51008 

99464 

194840 

Estimates  of  the  dynamic  ranges  encountered  in  the  application  of  the 
Burg  algorithm  can  be  made  by  examining  plots  of  spectra  obtained  from  data 
known  to  contain  signals  of  narrow  bandwidth  (e.g.,  [4,10,25]).  It  appears 
that  dynamic  ranges  of  40  to  60  dB  are  common.  Although  there  appear  to 
have  been  no  studies  on  the  magnitudes  of  the  signals  internal  to  an 
implementation  of  the  Burg  algorithm,  it  is  possible  to  speculate  on  their 
expected  magnitudes. 

Whenever  the  input  data  are  highly  correlated,  such  as  when  narrow- 
band  clutter  or  strong  target  echoes  are  present,  it  is  reasonable  to  expect 
that  the  magnitudes  of  the  reflection  coefficients  will  be  close  to  unity 
and  consequently  that  the  magnitudes  of  the  I^s,  f^(n)s  and  bK(n)s  will  also 
decrease  fairly  rapidly  as  a  function  of  K.  As  well,  for  highly  correlated 
inputs  with  intrinsically  narrow  spectra,  it  is  to  be  expected  tha  the  zeroes 
of  the  PEF  which  correspond  to  the  narrow  spectral  lines  will  be  close  to  the 
unit  circle  in  the  complex  Z-plane.  Such  zeroes  produce  very  small  minima  in 
the  denominator  of  eqn.  (2.39). 
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Thus  it  seams  that  there  are  potential  dynamic  range  problems  inherent 
in  the  evaluation  of  all  of  eqns,  (3.35),  (3.36),  (3.37)  and  (3,39).  Although 
further  investigations  are  required  in  order  to  clarify  this  issue,  it  appears 
that,  for  fixed-point  implementations,  word  lengths  greater  than  those 
required  for  satisfactory  operation  of  conventional  finite-length  impulse- 
response  (FIR)  or  DFT  Doppler  processors  may  be  required,  and  tha"t  the  use  of 
floating-point  hardware,  perhaps  with  word-lengths  comparable  to  those  found 
in  large-scale  computers  (o.g.,  32  to  38  bits)  might  bn  necessary. 


5.2  STATISTICAL  PROPERTIES  OF  THE  BURG  ALGORITHM 
5.2.1  Stability  and  Variability 

Because  of  its  nonlinear  formulation,  direct  investigation  of  the 
statistical  properties  of  the  Burg  algorithm  is  extremely  difficult.  For 
this  reason  theoretical  studies  are  usually  confined  to  consideration  of  tho 
effects  of  random  errors  in  estimates  of  the  sampled  autocorrelation  function 
when  spectra  are  estimated  using  tho  methodology  of  Section  2  [2,19]. 

Authors  who  claim  superiority  of  resolution  for  maximum- entropy  methods 
over  the  classical  Fourier  methods  in  the  analysis  of  tima-sorios  amplitude 
data  usually  do  so  on  the  basis  of  comparing  the  effective  number  of  auto¬ 
correlation  samples  (i.a,,  the  order  of  the  PEF  plus  one)  required  to  success¬ 
fully  estimate  the  spectrum  of  the  process  under  investigation,  ns  compared 
with  the  number  of  autocorrelation  samples  required  by  tho  classical  Fourier 
methods  to  achieve  similar  performance  (e.g.,  [2,3,19,20,21]).  However,  with 
the  exception  of  References  [3]  and  [21],  the  number  of  raw  data  contributing 
to  the  spectral  estimates  being  presented  as  reliable  is  very  much  largor 
than  the  orders  of  the  PEFs  derived  from  thorn. 

A  typical  example  is  the  paper  by  Moorcroft  [20]  which  considers  tho 
Doppler  spectra  of  radar  echoes  from  radio  aurora.  Duo  to  its  design,  the 
radar  system  produced  data  in  the  form  of  three  contiguous  but  mutually 
incoherent  blocks  of  27  time-series  samples  each.  A  modified  version  of  the 
Burg  algorithm,  which  took  into  account  the  lack  of  coherence  between  the 
three  data  blocks,  was  used  to  estimate  seventh-order  PEFs  from  sets  of  81 
data.  It  was  found  that,  on  the  average,  where  typical  averages  wore  taken 
over  25  to  150  spactral  estimates,  the  performance  of  this  modified  Burg 
algorithm  was  indeed  superior  to  that  of  classical  Fourier  spectral  estimates 
averaged  in  the  same  manner.  This  was  duo  to  two  fuctors.  The  first  was  the 
systematic  limitation  on  the  length  of  the  sets  of  data  which  could  be 
processed  coherently  (i.e.,  Fourier  transformed),  which  limited  tho  inherent 
resolution  of  the  Fourior  spectra.  The  second  was  tho  fact  that  apparently 
the  physical  process  generating  the  Doppler  spectra  could  be  adequately 
modelled  by  an  all-pole  whitening-filter  model,  as  discussed  in  Soction  1.4, 
However,  it  must  be  noted  that  the  total  number  of  raw  data  contributing  to 
a  single  averaged  spectrum  could  hardly  be  considered  as  limited,  and  the 
author  demonstrated  that  valid  spectral  estimates  could  be  derived  only  by 
averaging,  v 

Similar  studies  have  also  been  undertaken  using  computer-simulated 
clutter  data  [3,4]  and  limited  reel  data  [3]  which  indicate  that  reliable 
clutter  spectral  eatlmatea  may  perhaps  ba  derived  using  record  lengths  as 
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short  as  16  data  and  averaging  over  about  10  independent  sets  of  data  (e.g., 
successive  scans  of  a  surveillance  radar)  obtained  from  a  single  radar 
resolution  cell. 

Doppler  signature  analyses  of  jet-engined  aircraft  have  been  made  from 
the  processing  single  blocks  of  real  (i.e.,  single-channel  as  opposed  to 
quadrature-detected)  data  of  length  N“64  [22],  These  results  were  obtained 
using  an  experimental  short-range  highly  stable  scanning  radar,  and  the  data 
analyzed  were  from  a  range  cell  known  to  contain  a  target. 

5.2.2  Statistical  Distribution  of  Burg  Spectral  Estimates  [4] 

The  only  work  currently  (May  1979)  known  to  this  author  concerning 
estimates  of  the  statistical  distribution  of  Burg  spectral  estimates  is  that 
of  Chan  and  Haykin  [4].  These  investigators  have  performed  extensive 
computer  simulation  studies  of  the  response  of  the  Burg  algorithm  to  input 
data  consisting  of  computer-generated  white  random  noise  of  various 
statistical  distributions,  colored  random  noise  (to  simulate  radar  clutter 
data)  derived  by  filtering  white  gaussian  noise  data,  and  white  gaussian 
noise  plus  a  D.C.  line  (corresponding  to  very  narrow  stationary  clutter),  in 
the  presence  and  absence  of  sinusoidal  signals  of  fixed  frequencies  and  powers 
(corresponding  to  non-fluctuating  Doppler-shifted  targets).  The  salient 
observations  and  conclusions  are  summarized  below;  copies  of  the  complete 
report  are  available  from  its  authors. 

5.2.2. 1  Statistical  Distribution  of  Power  Spectral  Estimates  for  Noise 
and  Clutter  Data 


Experiments  consisting  of  examining  the  statistical  distributions  of 
the  estimated  power  spectral  density  at  various  Doppler  frequencies  for 
different  realizations  of  input  noise  data  (white  or  colored)  were  made.  It 
was  found  in  general  that,  for  white  gaussian  noise,  regardless  of  the  order 
of  the  PEF  derived  or  the  number  of  data  analyzed,  the  estimated  values  of 
the  spectral  density  at  any  given  frequency  were  log-normally  distributed; 
i.e.,  the  logarithms  of  the  spectral  density  estimates  at  any  given  Doppler 
frequency  had  a  gaussian  or  normal  distribution.  Such  a  distribution  has  a 
long  "tail",  so  that  if  use  of  the  Burg  algorithm  as  a  detector  is  comtem- 
plated,  then  detection  thresholds  must  be  set  very  high  in  order  to  achieve 
low  probability  of  false  alarm  (P„.) .  The  fact  that  such  a  probability 
distribution  function  describes  the  processor  output  statistics  also  implies 
that  large  dynamic  range  capability  within  the  processor  digital  hardware  may 
be  required. 

For  white  gaussian  noise  inputs,  it  was  found  that  the  mean  power 
spectral  density  level  increased  as  N,  the  number  of  data,  was  increased,  and 
decreased  as  M,  the  order  of  the  derived  PEF,  was  increased.  Also  it  was 
found  that  the  variance  of  the  power  spectral  density  decreased  as  N  was 
increased,  and  increased  as  M  was  increased. 


Reference  to  eqns.  (3.23)  or  (3.35),  and  (3.41)  suggests  these  results 
are  intuitively  reasonable.  If  successive  data  samples  are  uncorrelated, 
and  have  zero  mean,  then  the  expected  value  for  S(K,K)  is  zero,  and 
<| 6(K,K) | 2>  and  < | 6 (K, K) | 4 >  ought  to  vary  roughly  as  N-2  and  N-^  respectively, 
Examination  of  eqns.  (3.36)  and  (3.37)  shows  that  the  fK(n)s  and  the  bK(n)s 
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ought  also  to  be  gaussian  white  noise.  Then  consideration  of  eqns.  (3.39) 
and  (3.41)  suggests  that,  since  the  expected  values  of  the  PEF  coefficients 
are  zero  for  white  noise  input,  the  fluctuations  in  the  mean  power  level  and 
the  variance  of  spectra  are  due  principally  to  fluctuations  in  the  means  and 
variances  of  the  n^s,  which  in  turn  are  related  directly  to  the  means  and 
variances  of  the  |S(K,K)|2s.  This  discussion  therefore  implies  that,  as 
usual,  better  reliability  is  achieved  as  the  number  of  data  processed  is 
increased. 

For  colored  noise  inputs,  the  statistical  properties  of  the  Burg 
algorithm  became  more  highly  variable.  It  was  found  that  at  frequencies 
where  the  slope  of  the  noise  spectrum  was  significant,  the  tails  of  the 
observed  statistical  distributions  of  the  estimated  power  spectral  densities 
were  consistently  longer  than  those  of  the  corresponding  log-normal  distribu¬ 
tions  which  had  the  same  means  and  variances  as  the  observed  distributions. 
This  again  suggests  the  necessity  of  very  high  threshold  settings  for  accept¬ 
able  PpA,  and  the  requirement  for  large  dynamic  range  in  the  implementation 
of  the  reflection-coefficient  estimator.  Only  in  the  case  of  observations 
made  at  a  Doppler  frequency  where  the  slope  of  the  colored  noise  spectrum 
was  zero  did  a  log-normal  distribution  adequately  describe  the  observed 
statistics  of  the  output  power  spectral  densities. 

5. 2. 2. 2  Doppler  Detection  Characteristics  of  the  Burg  Algorithm 

The  characteristics  of  the  Burg  algorithm  as  a  Doppler  target  detector 
for  detecting  non-fluctuating  targets  in  the  presence  of  interference  in  the 
form  of  low-level  white  noise  (corresponding  to  receiver  noise)  plus  low-pass 
colored  noise  or  a  DC  line  (corresponding  to  clutter)  were  also  investigated 
by  means  of  simulation  studies  in  Ref.  [4],  The  performance  of  the  Burg 
algorithm  was  compared  to  that  of  8-point  and  16-point  DFTs  preceded  by  a 
two-pulse  canceller.  The  detection  threshold  was  set  for  a  PpA  of  10”®  (as 
determined  by  extrapolation  of  the  no-target  simulation  results),  and  a 
probability  of  detection  PD  of  0.9  was  chosen. 

It  was  found  that,  for  the  case  of  the  DC  line-clutter  model  only,  the 
performance  of  the  Burg  algorithm  was  superior  to  that  of  the  canceller-DFT 
processor  operating  on  the  same  data  for  Doppler  frequencies  in  the  range  of 
about  ±0.125  of  the  radar  PRF. 

It  was  suggested  [4]  on  this  basis  that  perhaps  the  Burg  algorithm 
could  provide  some  benefit  for  improved  Doppler  detection  in  the  region  of 
very  narrow-bandwidth  clutter,  but  the  validity  of  the  conjecture  remains  to 
be  demonstrated  for  clutter  with  small  but  finite  bandwidths. 


5.3  INVERSE  CLUTTER-CORRELATION  MATRIX  ESTIMATION  AND  ADAPTIVE  FILTERING 

An  interesting  potential  application  for  the  Burg  algorithm  has 
recently  been  proposed  in  Ref.  [23],  It  is  suggested  there  that  the  Burg 
algorithm  be  used  to  estimate  a  set  of  reflection  coefficients  based  on  data 
containing  clutter  returns  only,  and  that  these  reflection  coefficients  then 
be  used  to  compute  an  estimate  of  the  inverse  of  the  extrapolated  clutter 
correlation  matrix.  This  inverse  matrix  is  used  to  generate  a  set  of  adaptive 
weights  w  for  use  in  a  maximum-likelihood  Doppler  filter  bank,  according  to 
the  well-known  result 


40 


w  <*  [R]_1  u  (5.2) 

where  here  [R]~^  Is  the  estimated  inverse  clutter-correlation  matrix  and  _u  is 
a  frequency-steering  vector 

u+  ■»  |exp[-j  ir (N-l)f At] ,  exp[-j  ir (N-3)f  At] , . . . , 

exp[  jir(N-l)f  At]|  (5.3) 

As  before,  the  superscript  +  denotes  conjugate  transposition,  and  N  is  the 
length  of  the  data  set  and  the  dimension  of  the  inverse  extrapolated  matrix. 

A  technique  is  also  described  for  updating  the  estimated  inverse  matrix. 
The  reflection  coefficient  estimates  are  updated  on  a  data-set  by  data-set 
basis,  using  decaying-memory  digital  integration,  and  periodically  a  new 
estimate  of  the  inverse  matrix  is  made.  (This  averaging  technique  differs 
from  that  described  in  Section  5.2.1:  there  the  numerator  and  denominator 
terms  of  eqn.  (3.35)  are  averaged  before  the  quotient  is  computed;  here  it  is 
the  quotients  that  are  averaged.) 

The  results  obtained  using  simulated  clutter  data  appear  encouraging. 
However,  it  remains  to  demonstrate  the  effectiveness  of  the  technique  using 
actual  radar  data.  Also,  schemes  for  avoiding  the  inadvertent  suppression 
of  wanted  targets  must  be  considered. 


5.4  ANGULAR  SPECTRUM  ESTIMATION 

It  is  well  known  that  the  angle  of  arrival  of  signals  having  a  known 
carrier  frequency  and  incident  on  a  linear  antenna  array  can  be  described  by 
an  angular  spectrum  analogous  to  the  frequency  spectrum  describing  time-domain 
signals.  However,  caution  must  be  exercised  in  pursuing  this  analogy,  for 
the  characteristics  of  the  time  and  spatially  sampled  signals  can  be  quite 
different.  For  example,  multiple  "glints"  from  a  moving  radar  target  (e.g., 
an  aircraft)  as  observed  across  an  antenna  aperture  correspond  at  any  instant 
in  time  to  a  set  of  very  narrow  angular  spectral  lines  with  very  small  angular 
separation,  whereas  the  Doppler  spectrum  from  such  a  glinting  target  will  be 
much  more  complicated,  due  to  quasi-random  amplitude  fading  imposed  on  the 
echoes  by  the  continually  varying  angular  orientation  of  the  moving  target. 
Furthermore,  there  exists  the  potential  for  short-term  time  averaging  in  the 
analysis  of  angular  spectra,  the  analog  of  which  (spatial  diversity)  is 
often  not  available  for  the  analysis  of  Doppler  spectra. 

Unlike  the  case  of  sonar  and  seismic  data  processing,  relatively  little 
work  has  been  published  concerning  the  angular  spectral  analysis  of  radar¬ 
like  signals  using  maximum-entropy  techniques.  In  fact,  only  three  signifi¬ 
cant  and  relevant  papers  [24,25,26]  on  this  topic  have  come  to  the  attention 
of  this  reviewer. 

King  [24]  has  performed  an  interesting  set  of  simulation  studies  which 
show  both  the  beneficial  effects  of  averaging  angular  spectra  and  also  the 
occurrence  of  "line  splitting",  the  troublesome  phenomenon  which  can  occur 
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when  the  Burg  algorithm  is  applied  to  signals  containing  multiple  sinusoidal 
signals  at  high  SNR  [27], 

On  the  experimental  side,  an  attempt  has  been  made  to  apply  the  Burg 
algorithm  to  the  measurement  of  the  elevation  angles  of  arrival  of  multi- 
component  H.F.  (3-30  MHz)  skywaves  [25]  observed  using  a  vertical  antenna 
array  comprised  of  eight  horizontal  loop  elements.  HF  skywave  signals  are 
almost  invariably  afflicted  with  ionospherically-caused  multipath  components, 
and  are  additionally  complicated  by  reflections  from  a  non-uniform  ground- 
plane  in  the  near  field  of  the  antenna.  While  the  Burg  algorithm  was  success¬ 
fully  applied  to  simulated  data  modelled  after  a  theoretical  description  of 
the  expected  data,  the  experimental  data  proved  too  much  for  it,  although 
more  conventional  Fourier  processing  appeared  to  cope  successfully. 

However,  there  exists  one  report  [25]  of  experimentally  successful 
angular  spectrum  estimation  at  1.257  GHz  using  a  12-element  vertical  sampled- 
aperture  array  to  receive  CW  signals  from  an  airborne  beacon  at  a  range  of 
0.4  mi  (0.6  km).  Clearly  the  SNR  of  such  a  signal  would  be  high.  Also,  the 
direct  and  ground-reflected  signals  were  separated  in  angle  by  more  than  the 
classical  beamwidth  of  the  array. 

The  fourth-order  (M=4)  Burg  spectral  estimate  was  compared  to  the 
classical  Fourier  transform  angular  spectrum  and  to  a  maximum-likelihood 
spectral  estimate.  The  Burg  spectrum  was  the  easiest  to  interpret,  consisting 
simply  of  two  spikes  corresponding  to  the  direct  and  reflected  signals.  How¬ 
ever,  although  it  was  not  pointed  out  in  the  report,  the  peak  of  the  Fourier 
transform  spectrum  corresponded  more  closely  to  the  geometrically  expected 
angle  of  arrival  than  did  the  Burg  estimate.  Perhaps  this  was  another 
manifestation  of  the  line-splitting  and  closely  related  line-pulling 
phenomenon  which  is  sometimes  encountered  in  high  SNR  situations. 


6.  SUMMARY  AND  PROGNOSIS 

This  report  has  had  two  main  purposes.  The  first  was  to  present  the 
foundations  and  mathematical  development  of  the  maximum-entropy  (Section  2) 
and  Burg  (Section  3)  algorithms  for  autoregressive  spectral  analysis  in  a 
clear  and  relatively  complete  manner.  Emphasis  was  on  clarity  of  presenta¬ 
tion,  in  order  to  remove  the  apparent  shroud  of  mystery  which  often  seems  to 
surround  these  topics.  For  completeness,  Fourier  transform  techniques  were 
briefly  discussed  in  Section  1.3  and  other  techniques  were  discussed  in 
Section  1.4  and  Section  4. 

The  report’s  second  purpose  was  to  describe  and  discuss  areas  of 
potential  application  of  the  Burg  algorithm  to  radar  signal  processing.  It 
appears  that  there  are  two  areas  of  promise:  clutter  analysis,  classifica¬ 
tion  and/or  suppression;  and  angular  spectrum  (angle-of-arrival)  estimation. 
Preliminary  results  concerning  the  direct  application  of  the  Burg  spectral 
estimate  (eqn.  (3.39))  as  a  target  detector  are  pessimistic:  it  appears  that 
a  white  noise  input  signal  with  gaussian  statistics  gives  rise  to  outputs 
with  log-normal  statistics  in  the  spectral  domain. 


I 
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6.1  CLUTTER  PROCESSING 

It  appears  for  example,  that  if  sufficient  data  are  available  from  a 
scanning  surveillance  radar,  e.g.,  10  scans  of  a  resolution  cell  with  8-20 
hits  per  scan,  then  the  modified  [20]  Burg  algorithm  (see  Section  5.2.1)  can 
be  used  to  obtain  a  single  prediction-error  filter  (PEF)  which  out  to 
reliably  represent  the  clutter  spectrum  from  the  particular  cell  under 
observation.  The  PEF  can  then  be  used  to  form  a  spectral  estimate  which 
characterizes,  and  thus  can  be  used  to  classify,  the  observed  clutter  (e.g., 
ground,  weather,  birds).  Also,  the  PEF  can  be  used  in  a  conventional  digital 
transversal  or  finite-duration  impulse-response  (FIR)  MTI  whitening  filter 
which  would  be  updated  on  a  regular  basis  and  thus  be  adapted  to  suppress  the 
observed  clutter  spectrum,  or  to  adaptively  design  a  maximum-likelihood 
Doppler  filter  bank  [23]  for  the  optimum  detection  of  moving  targets. 

However,  the  computational  complexity  of  the  algorithm,  as  discussed  in 
Section  5.1  et  seq.,  indicates  that  clutter  estimation  could  likely  be  done 
on  selected  cells  only,  so  that  some  sort  of  area  clutter  mapping  procedure 
would  be  required  to  decide  what  cells  to  process,  and  over  what  adjacent 
coverage  areas  the  results  could  be  validly  extrapolated.  This,  the  avoidance 
of  suppressing  wanted  targets,  and  the  selection  of  the  appropriate  order  of 
the  PEF  are  but  three  aspects  of  this  area  of  application  which  require 
further,  experimental  study. 

6.2  ANGULAR  SPECTRUM  (ANGLE-OF-ARRIVAL)  ANALYSIS 

Although  the  Burg  algorithm  as  described  here  is  applicable  only  to 
single-channel  equi-spaced  sampled  data  as  would  be  obtained  from  a  linear 
sampled-aperture  array,  it  appears  that  the  application  of  the  algorithm  to 
such  data  warrants  further  investigation.  In  particular,  the  option  of 
appropriately  time-averaging  individual  "frames"  of  array  data  may  mean  that 
good  statistical  reliability  may  be  obtainable  in  relatively  short  time 
intervals.  However,  it  is  not  clear  at  present  whether  or  in  what  context 
sufficient  benefits  could  be  gained  to  warrant  the  complexity  of  processing 
required  to  implement  the  algorithm. 
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